MI-Prometheus

The core concepts of MI-Prometheus. Dotted elements indicate optional inputs/outputs/dataflows.

Core concepts

When training a model, people write programs which typically follow a similar pattern:

  • Loading data samples & instantiating the model,
  • Feeding the model batches of sample-label pairs, which are passed through the model forward pass,
  • Computing the loss as a difference between the predicted labels and the ground truth labels,
  • This error is propagated backwards using backpropagation,
  • Updating the model parameters using an optimizer.

During each iteration, the program can also collect statistics (such as the training / validation loss & accuracy) and optionally save the weights of the resulting model to file.

This typical workflow led us to the formalization of the core concepts of the framework:

  • Problem: A dataset or a data generator, returning a batch of inputs and ground truth labels used for a model training/validation/test,
  • Model: A trainable model (i.e. a neural network),
  • Worker: A specialized application that instantiates the Problem & Model objects and controls the interactions between them, e.g. during training or inference,
  • Configuration file(s): YAML file(s) containing the parameters of the Problem, Model and training procedure,
  • Experiment: A single run (training & validation or test) of a given Model on a given Problem, using a specific Worker and Configuration file(s).

Aside of the Workers, MI-Prometheus currently offers 2 other types of specialized applications, namely:

  • Grid Worker: A specialized application which automates the handling of a number of experiments in parallel.
  • Helper: An application useful from the point of view of a running experiment, but which is independent and external to the Workers.

The general idea here is that the Grid Workers are useful to reproduce research, e.g. when one trains a set of independent models on a set of problems and compare the results. In such a situation, the user can use a Helper to download the required datasets (before training) and/or preprocess them in a specific way (e.g. extract representations), which will reduce the overall time of all experiments.

You can read more about MI-Prometheus here.

Avatar
Vincent Marois
ML Engineer

My interests lie at the intersection of AI Research and Software Engineering.

Next