GSA
GSA

Reputation: 813

Looped Regression Model Returning Single Result, Instead of Multiple Outputs

I am running a ridge regression model over some bootstrapped resampled data, for the sake of this question, let's say two bootstrapped samples in a list of dataframes. However, when I iterate over the list of dataframes, I get only one output, instead of two outputs corresponding to each dataframe in the list. Not sure what else I am missing in my code.

Below are the sample datasets

import pandas as pd
import numpy as np

# the resampled datasets
d1 = {'v1': [2.5, 4.5, 3.3, 4.0, 3.8, 2.5, 4.5, 3.3, 4.0, 3.8, 2.5, 4.5, 3.3, 4.0, 3.8, 2.5, 4.5, 3.3, 4.0, 3.8],
      'v2': [3.5, 3.8, 2.5, 4.0, 4.0, 3.5, 3.8, 2.5, 4.0, 4.0, 3.5, 3.8, 2.5, 4.0, 4.0, 3.8, 3.89, 2.75, 4.5, 4.25],
      'v3': [4.5, 3.8, 3.5, 4.2, 4.3, 1.5, 2.98, 3.5, 3.5, 4.5, 3.8, 3.89, 2.75, 4.5, 4.25, 3.55, 3.85, 2.98, 4.05, 4.50]}
df1 = pd.DataFrame(d1)


d2 = {'v1': [2.6, 4.0, 3.3, 4.0, 3.0, 2.5, 4.5, 3.3, 4.0, 3.8, 4.5, 3.8, 3.5, 4.2, 4.3, 4.25, 3.55, 3.85, 2.98, 4.05],
      'v2': [3.8, 3.89, 2.75, 4.5, 4.25, 3.55, 3.85, 2.98, 4.05, 4.50, 3.5, 2.98, 3.5, 3.25, 4.25, 4.0, 4.0, 3.5, 3.8, 2.5],
      'v3': [4.0, 3.85, 3.75, 4.0, 4.73, 3.5, 2.98, 3.5, 3.25, 4.25, 3.3, 4.0, 3.8, 2.5, 4.5, 3.3, 4.0, 3.8, 2.5, 4.5]} 
df2 = pd.DataFrame(d2)

dflst = [df1, df2]

and the codes I am running on them.

from sklearn.linear_model import Ridge

# function to run ridge regression
def ridgereg(data, ynum=1): 

    y = np.asarray(data.iloc[:, 0:ynum])
    X = np.asarray(data.iloc[:, ynum:])
        
    model = Ridge(alpha=1.0).fit(X,y)
    return model.coef_

# iterate over list of dfs
for x in range(1, len(dflst)):
    resampled_model = {}
    resampled_model[x] = ridgereg(dflst[x], ynum=1)

print(resampled_model)

Upvotes: 1

Views: 42

Answers (1)

j1-lee
j1-lee

Reputation: 13939

In the for loop, you are creating a new dict at each iteration, throwing the previously made dict away.

Try (using enumerate):

resampled_model = {} # note that it is outside the loop
for i, df in enumerate(dflst, start=1):
    resampled_model[i] = ridgereg(df, ynum=1)

print(resampled_model)
# {1: array([[0.35603345, 0.1373456 ]]), 2: array([[ 0.08019198, -0.10895105]])}

Instead of the for loop, you can use dict comprehension:

resampled_model = {i: ridgereg(df, ynum=1) for i, df in enumerate(dflst, start=1)}

Upvotes: 1

Related Questions