Reputation: 13
I am trying to make a data frame that can store variable coeff value after each iteration. I am able to plot the graph after each iteration. but when I tried to insert the value in the data frame after each iteration.
I am getting this error.
None of [Int64Index([ 3169, 3170, 3171, 3172, 3173, 3174, 3175, 3176, 3177,\n 3178,\n ...\n 31671, 31672, 31673, 31674, 31675, 31676, 31677, 31678, 31679,\n
31680],\n dtype='int64', length=28512)] are in the [columns]
This is the code I use:
from sklearn.model_selection import KFold
kf = KFold(n_splits=10)
cvlasso= Lasso(alpha=0.001)
count = 1
var = pd.DataFrame()
for train, _ in kf.split(X, Y):
cvlasso.fit(X.iloc[train, :], Y.iloc[train])
importances_index_desc = cvlasso.coef_
feature_labels = list(X.columns.values)
importance = pd.Series(importances_index_desc, feature_labels)
plt.figure()
plt.bar(feature_labels, importances_index_desc)
plt.xticks(feature_labels, rotation='vertical')
plt.ylabel('Importance')
plt.xlabel('Features')
plt.title('Fold {}'.format(count))
count = count + 1
var[train] = importances_index_desc
plt.show()
and one more thing there is a total of 33000 observations in my dataset but at the end of the loop, the train value is 28512? Does anyone know why train value is not 33000?
Upvotes: 0
Views: 743
Reputation: 569
Try the following.
Instead of,
var = pd.DataFrame()
Create a dataframe with heading
var = pd.DataFrame(columns=['impt_idx_desc'])
Then in the loop use the 'loc' function as,
var.loc[count] = [importances_index_desc]
where count is increased by +1 in the loop.
Upvotes: 0
Reputation: 480
Another solution could be using pandas.DataFrame.append(pandas.DataFrame):
important_index_desc = pd.DataFrame(important_index_desc)
var = var.append(important_index_desc)
Let me know if this helps!
Upvotes: 0
Reputation: 603
train
is the list of index of train data returned from KFold. You put train
as accessing column in var[train]
that will cause the error because none of index value is a DataFrame column .
IMO, setting complicated value as index is not good idea, just use simple value as index, for example
var.loc[count] = importances_index_desc
count += 1
Upvotes: 1