MI-Prometheus

Sat, Dec 8, 2018

GitHub Documentation

The core concepts of MI-Prometheus. Dotted elements indicate optional inputs/outputs/dataflows.

Core concepts

When training a model, people write programs which typically follow a similar pattern:

Loading data samples & instantiating the model,
Feeding the model batches of sample-label pairs, which are passed through the model forward pass,
Computing the loss as a difference between the predicted labels and the ground truth labels,
This error is propagated backwards using backpropagation,
Updating the model parameters using an optimizer.

During each iteration, the program can also collect statistics (such as the training / validation loss & accuracy) and optionally save the weights of the resulting model to file.

This typical workflow led us to the formalization of the core concepts of the framework:

Problem: A dataset or a data generator, returning a batch of inputs and ground truth labels used for a model training/validation/test,
Model: A trainable model (i.e. a neural network),
Worker: A specialized application that instantiates the Problem & Model objects and controls the interactions between them, e.g. during training or inference,
Configuration file(s): YAML file(s) containing the parameters of the Problem, Model and training procedure,
Experiment: A single run (training & validation or test) of a given Model on a given Problem, using a specific Worker and Configuration file(s).

Aside of the Workers, MI-Prometheus currently offers 2 other types of specialized applications, namely:

Grid Worker: A specialized application which automates the handling of a number of experiments in parallel.
Helper: An application useful from the point of view of a running experiment, but which is independent and external to the Workers.

The general idea here is that the Grid Workers are useful to reproduce research, e.g. when one trains a set of independent models on a set of problems and compare the results. In such a situation, the user can use a Helper to download the required datasets (before training) and/or preprocess them in a specific way (e.g. extract representations), which will reduce the overall time of all experiments.

You can read more about MI-Prometheus here.

Vincent Marois

ML Engineer

My interests lie at the intersection of AI Research and Software Engineering.