user308827
user308827

Reputation: 21961

Building ensemble model with the python merf library

I would like to use the merf (mixed effect random forest) library in an ensemble model e.g. by using the mlens or mlxtend python libraries. However, due to the non-traditional way in which the fit and predict methods of merf are structured, I am unable to figure out how to do that:

from merf import MERF
merf = MERF()
merf.fit(X_train, Z_train, clusters_train, y_train)
y_hat = merf.predict(X_test, Z_test, clusters_test)

Is there a way I can use the merf library in an ensemble model? The issue is that building an ensemble model with mlens or other ensemble libraries assumes a scikit-learn structure where the fit method has X, y as input and the predict method has X as the input. However, merf clearly has more inputs in both the fit and predict methods. Here is a simplified syntax for mlens:

from mlens.ensemble import SuperLearner 
ensemble = SuperLearner()
ensemble.add(estimators)
ensemble.add_meta(meta_estimator)
ensemble.fit(X, y).predict(X)

I am not restricted to using mlens or mlxten. Any other way to build an ensemble model with merf in it would work too.

Upvotes: 10

Views: 448

Answers (1)

DialFrost
DialFrost

Reputation: 1770

I mean, you could always just sneak in the datamaking process using merf :P. Majority of the data generation is taken from manifoldai merf example:

from merf.utils import MERFDataGenerator
import numpy as np
from mlens.ensemble import SuperLearner
from sklearn.svm import SVR
from sklearn.linear_model import Lasso
from mlens.metrics.metrics import rmse

dgm = MERFDataGenerator(m = .6, sigma_b = np.sqrt(4.5), sigma_e = 1)

num_clusters_each_size = 20
train_sizes = [1, 3, 5, 7, 9]
known_sizes = [9, 27, 45, 63, 81]
new_sizes = [10, 30, 50, 70, 90]

train_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(train_sizes, num_clusters_each_size)
known_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(known_sizes, num_clusters_each_size)
new_cluster_sizes = MERFDataGenerator.create_cluster_sizes_array(new_sizes, num_clusters_each_size)

train, test_known, test_new, training_cluster_ids, ptev, prev = dgm.generate_split_samples(train_cluster_sizes, known_cluster_sizes, new_cluster_sizes)

X_train = train[['X_0', 'X_1', 'X_2']]
Z_train = train[['Z']]
clusters_train = train['cluster']
y_train = train['y']

Before making the fit and prediction with some modification from Flennerhag mlens.ensemble superlearner.py (Github):

ensemble = SuperLearner()
ensemble.add([SVR(), Lasso()])
ensemble.add_meta(SVR())
pred = ensemble.fit(X_train, y_train).predict(X_train)

root = rmse(y_train, pred)

print(root)

>>>

2.345318341087564

But of course, there is always a better method overall if you don't mind specifically using the merf and ensemble together.

Keras method

from keras.models import Sequential
from keras.layers import Dense
from matplotlib import pyplot
from keras import backend
import matplotlib.pyplot as plt
import numpy as np
 
def rmse(y_true, y_pred):
    return backend.sqrt(backend.mean(backend.square(y_pred - y_true), axis=-1))

X = X_train.to_numpy().flatten()
model = Sequential()
model.add(Dense(2, input_dim=1, activation='relu'))
model.add(Dense(1))
model.compile(loss='mse', optimizer='adam', metrics=[rmse])
history = model.fit(X, X, epochs=500, batch_size=len(X), verbose=2)
plt.plot(history.history['rmse'])
plt.title("keras loss function")
plt.show()

>>>

enter image description here

Do note that the X_train used here is from the previous merf code:

X_train = train[['X_0', 'X_1', 'X_2']]

Upvotes: -1

Related Questions