Reputation: 813
I am running a ridge regression model over some bootstrapped resampled data, for the sake of this question, let's say two bootstrapped samples in a list of dataframes. However, when I iterate over the list of dataframes, I get only one output, instead of two outputs corresponding to each dataframe in the list. Not sure what else I am missing in my code.
Below are the sample datasets
import pandas as pd
import numpy as np
# the resampled datasets
d1 = {'v1': [2.5, 4.5, 3.3, 4.0, 3.8, 2.5, 4.5, 3.3, 4.0, 3.8, 2.5, 4.5, 3.3, 4.0, 3.8, 2.5, 4.5, 3.3, 4.0, 3.8],
'v2': [3.5, 3.8, 2.5, 4.0, 4.0, 3.5, 3.8, 2.5, 4.0, 4.0, 3.5, 3.8, 2.5, 4.0, 4.0, 3.8, 3.89, 2.75, 4.5, 4.25],
'v3': [4.5, 3.8, 3.5, 4.2, 4.3, 1.5, 2.98, 3.5, 3.5, 4.5, 3.8, 3.89, 2.75, 4.5, 4.25, 3.55, 3.85, 2.98, 4.05, 4.50]}
df1 = pd.DataFrame(d1)
d2 = {'v1': [2.6, 4.0, 3.3, 4.0, 3.0, 2.5, 4.5, 3.3, 4.0, 3.8, 4.5, 3.8, 3.5, 4.2, 4.3, 4.25, 3.55, 3.85, 2.98, 4.05],
'v2': [3.8, 3.89, 2.75, 4.5, 4.25, 3.55, 3.85, 2.98, 4.05, 4.50, 3.5, 2.98, 3.5, 3.25, 4.25, 4.0, 4.0, 3.5, 3.8, 2.5],
'v3': [4.0, 3.85, 3.75, 4.0, 4.73, 3.5, 2.98, 3.5, 3.25, 4.25, 3.3, 4.0, 3.8, 2.5, 4.5, 3.3, 4.0, 3.8, 2.5, 4.5]}
df2 = pd.DataFrame(d2)
dflst = [df1, df2]
and the codes I am running on them.
from sklearn.linear_model import Ridge
# function to run ridge regression
def ridgereg(data, ynum=1):
y = np.asarray(data.iloc[:, 0:ynum])
X = np.asarray(data.iloc[:, ynum:])
model = Ridge(alpha=1.0).fit(X,y)
return model.coef_
# iterate over list of dfs
for x in range(1, len(dflst)):
resampled_model = {}
resampled_model[x] = ridgereg(dflst[x], ynum=1)
print(resampled_model)
Upvotes: 1
Views: 42
Reputation: 13939
In the for loop, you are creating a new dict at each iteration, throwing the previously made dict away.
Try (using enumerate
):
resampled_model = {} # note that it is outside the loop
for i, df in enumerate(dflst, start=1):
resampled_model[i] = ridgereg(df, ynum=1)
print(resampled_model)
# {1: array([[0.35603345, 0.1373456 ]]), 2: array([[ 0.08019198, -0.10895105]])}
Instead of the for
loop, you can use dict comprehension:
resampled_model = {i: ridgereg(df, ynum=1) for i, df in enumerate(dflst, start=1)}
Upvotes: 1