Warzone
Warzone

Reputation: 13

How to update values in pandas dataframe in a for loop?

I am trying to make a data frame that can store variable coeff value after each iteration. I am able to plot the graph after each iteration. but when I tried to insert the value in the data frame after each iteration.

I am getting this error.

None of [Int64Index([ 3169, 3170, 3171, 3172, 3173, 3174, 3175, 3176, 3177,\n 3178,\n ...\n 31671, 31672, 31673, 31674, 31675, 31676, 31677, 31678, 31679,\n
31680],\n dtype='int64', length=28512)] are in the [columns]

This is the code I use:

from sklearn.model_selection import KFold

kf = KFold(n_splits=10)
cvlasso= Lasso(alpha=0.001)
count = 1

var = pd.DataFrame()


for train, _ in kf.split(X, Y):
    cvlasso.fit(X.iloc[train, :], Y.iloc[train])
    importances_index_desc = cvlasso.coef_
    feature_labels = list(X.columns.values)
    importance = pd.Series(importances_index_desc, feature_labels)
    plt.figure()
    plt.bar(feature_labels, importances_index_desc)
    plt.xticks(feature_labels, rotation='vertical')
    plt.ylabel('Importance')
    plt.xlabel('Features')
    plt.title('Fold {}'.format(count))
    count = count + 1
    var[train] = importances_index_desc

plt.show()

and one more thing there is a total of 33000 observations in my dataset but at the end of the loop, the train value is 28512? Does anyone know why train value is not 33000?

Upvotes: 0

Views: 743

Answers (3)

hiranyajaya
hiranyajaya

Reputation: 569

Try the following.

Instead of,

var = pd.DataFrame()

Create a dataframe with heading

var = pd.DataFrame(columns=['impt_idx_desc'])

Then in the loop use the 'loc' function as,

var.loc[count] = [importances_index_desc]

where count is increased by +1 in the loop.

Upvotes: 0

Ramsha Siddiqui
Ramsha Siddiqui

Reputation: 480

Another solution could be using pandas.DataFrame.append(pandas.DataFrame):

important_index_desc = pd.DataFrame(important_index_desc)
var = var.append(important_index_desc)

Let me know if this helps!

Upvotes: 0

loginmind
loginmind

Reputation: 603

train is the list of index of train data returned from KFold. You put train as accessing column in var[train] that will cause the error because none of index value is a DataFrame column .

IMO, setting complicated value as index is not good idea, just use simple value as index, for example

var.loc[count] = importances_index_desc
count += 1

Upvotes: 1

Related Questions