Leon Luithlen

A Technical Introduction to AdaEnsemble

AdaEnsemble is a library for building adaptive ensembles, which consist in a multi armed bandit algorithm choosing among machine learning models and gradually learning which ML model is best, or best given a particular context. These ensembles consist of an Ensemble object which “contains” two or more Model objects and an equal number of Distribution objects that contain the representation of the reward associated with each model. This document will outline the specifics of different types of ensembles, models and rewards, and updated as the library expands.

Ensembles

The library “AdaEnsemble” contains three broad types of ensembles: “stackable” ensembles of types 1 and 2, and contextual ensembles.

“Stackable” ensembles of type 1 employ a non-contextual multi armed bandit (MAB) algorithm to choose between ML models. Examples of such algorithms are epsilon greedy, UCB, Exp3 and Thompson Sampling, of which all but UCB are already implemented. An exemplary use case can be found here.
“Stackable” ensembles of type 2 employ contextual MAB algorithms to choose between ML models. Importantly, they pass the same data used for contextual model selection to the model itself for prediction or transformation.
Contextual ensembles also use a contextual MAB algorithm to choose a model, but enable the model selection based on other data and a different data type to the data passed to the models at the bottom of the hierarchy. For example, the models might be image classifiers that take a square image of a fixed size, while the ensemble chooses the classifier based on (for example) angle information.

All three types of ensembles are stackable: they can contain other ensembles as models. This results in a hierarchy of ensembles, each choosing (on selection by the layer above) a member model, which can be an ensemble in its own right or another kind of model. The only limitation is that “stackable” ensembles of types 1 and 2 cannot be mixed with contextual ensembles.

Ensemble type	Description
Stackable ensemble of type 1	Model selection does not depend on data
Stackable ensemble of type 2	Model selection depends on data
Contextual ensemble	Model selection depends on a context different from the data passed to the selected Model object

Ensemble Type Parameters

The different elements of an adaptive ensemble (Ensemble object, Models and Distributions that represent rewards) are tied together by type parameters. The five type parameters, of which not all apply to every object are ModelID, Context, ModelData, ModelAction and AggregateReward.

ModelID is the type of the key by which member models of an ensemble are identified – in the demos this is always Int, but in applied settings it might also be a custom type useful for a particular application.
Context exists only in contextual ensembles and associated models. It is the type for the data on which model selection is based, for example an Array[Double].
ModelData is the type for data which is passed to the bottom layer of models for inference. In “stackable” ensembles of type 2, data of type ModelData is also used for model selection
ModelAction is the type of the output of the bottom layer of models. It might be a Double for regression models or an Array[Int] for classification models
AggregateReward is the type of the representation of the reward associated with each model. For a “stackable” ensemble of type 1, it has to be a subtype of SimpleDistribution, which represents reward distributions that do not depend on data. For a “stackable” ensemble of type 2, it has to be of type ConditionalDistribution[ModelData], and for a contextual ensemble it has to be of type ConditionalDistribution[Context]. ConditionalDistribution[A] represents a distribution that is conditional on data of type A.

These type parameters structure the library as a whole, and putting together an adaptive ensemble usually requires that they match for the different components.

Constraints on type parameters

Ensembles that use unconditional Thompson Sampling are constrained to employ an AggregateReward that is a BetaDistribution, and Ensembles that use conditional Thompson Sampling are constrained to use BayesianSampleRegressionDistribution.

Models

So far, three kinds of models are available:

Static models always return the value given to them on initialisation
Onnx models apply a arbitrary machine learning model saved in the ONNX format
Bayesian models are bayesian regression models that either return the mean or a sampled value from the output distribution

The most versatile of these are the onnx models, as they enable the incorporation of all or almost all machine learning models developed and trained in scikit-learn, TensorFlow, PyTorch, and many other machine learning frameworks. The only downside is that onnx does not allow continued training of the models – onnx models are static.

Constraints on type parameters

Models impose constraints on the type parameter values that are available to an ensemble. They are summarised in the following table.

Static models	No constraints
Onnx models	In OnnxTensor.createTensor(env, data), data is of type ModelData and must be compatible with the createTensor function
Bayesian regression models	ModelData must be of type Array[Double] and ModelAction of type Double

Distributions

Distribution is the type any representation of the reward associated with a model must take. Such a representation can either be a SimpleDistribution in case the reward does not depend on any data, or a ConditionalDistribution[A] in case it does.

The available implementations of SimpleDistribution are the following:

MeanDouble takes the arithmetic mean of all the updates it has received
BetaDistribution is a beta distributions, used for Thompson Sampling
Exp3Reward returns its current value, but updates according to the Exp3 rule. The final reward is calculated in the Exp3 ensemble, so that ensemble state isn’t represented in the Exp3Reward

The available implementations of ConditionalDistribution are the following:

PointRegressionDistribution wraps the SMILE linear regression
BayesianSampleRegressionDistribution is a bayesian regression where the value is sampled from the dependent distribution. This is used for contextual Thompson Sampling
BayesianMeanRegressionDistributionis a bayesian regression that returns the mean value, thereby similar to a PointRegressionDistribution in its uses

Constraints on type parameters

ConditionalDistribution impose the constraint that the type parameter Context it takes is a subtype of Array[Double]. For a contextual distribution, the ensemble type parameter Context must thus be a subtype of Array[Double], while for a stackable ensemble of type 2, ModelData must be a subtype of Array[Double].

Example

Here, for example, is the way to create an epsilon-greedy ensemble, which picks the model with the highest reward with probability (1 – epsilon) and randomly one of the others with probability epsilon. The ModelID type is Int, the ModelData type is Array[Double], the ModelAction type is Double and the reward distributions is MeanDouble.

1. Create a static model, which always returns the same value:

       new StaticModel[Int, Array[Double], MeanDouble](0.0)

2. Create a bayesian regression model:

       new BayesianMeanRegressionModel[Int, MeanDouble](5, alpha=0.3, beta=5.0)

3. Create the rewards:

       List(new MeanDouble, new MeanDouble)

4. Create  the ensemble:

       new GreedyEnsemble[Int, Array[Double], Double, MeanDouble](	
               models=(i: Int) => modelsMap(i),
               modelKeys=() => List(0, 1),
               modelRewards=rewards,
               epsilon= 0.3)

Applications

My hope is that the hierarchical combination of exploratory multi armed bandits with fixed or slowly evolving neural and other machine learning algorithms has many applications.

Three I have thought of:

as insurance for an incrmenetal learning algorithm: given that the parameters change continuously and without human supervision, it might be useful to guarantee that if its performance drops below that of a fixed model defined at the start, the fixed model will be adopted (this was the motivating example for the development of the library)
for online model selection which might be useful e.g. in the case of underspecification
to join various model together in a context aware fashion, making use of the best model for any given circumstance
hyperparamter optimisation for an incremental learning algorithm: given many possible hyperparameter values, a simple solution would be to use them all and let the MAB algorithm choose the best hyperparamters by choosing the best model, given the rewards accumulated by them