Reputation: 4407
my question is for every step of for loop, a new dataframe will be generated. I want to concat the data frames together to have a larger one but somehow my function will only return the last step of the result rather than the merged result
def crossV(clf,data,n):
cvResult=pd.DataFrame()
for i in range(n+2)[2:]:
cvResult=pd.DataFrame()
tt=array(tuple(x[1:i] for x in data))
qq=array(tuple(x[0] for x in data))
recall_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='recall')*100
precision_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='precision')*100
accuracy_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='accuracy')*100
index_i=Series(np.repeat(i-1,10))
classifier_i=Series(np.repeat(str(clf)[:7],10))
recall_rate=Series(recall_rate)
precision_rate=Series(precision_rate)
accuracy_rate=Series(accuracy_rate)
rate={"classfier":classifier_i,"model":index_i,"recall":recall_rate,"precision":precision_rate,"accuracy":accuracy_rate}
result=pd.concat(rate,axis=1)
cvResult=cvResult.append(result)
return(cvResult)
Thanks!
Upvotes: 0
Views: 110
Reputation: 2459
This might not be the right answer, it is more readable written as answer.
I think the right logic should be ( but I can be very wrong):
def crossV(clf,data,n):
cvResult=pd.DataFrame() #create an empty DF here.
for i in range(n+2)[2:]:
# cvResult=pd.DataFrame() -- remove this line.
tt=array(tuple(x[1:i] for x in data))
qq=array(tuple(x[0] for x in data))
recall_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='recall')*100
precision_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='precision')*100
accuracy_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='accuracy')*100
index_i=Series(np.repeat(i-1,10))
classifier_i=Series(np.repeat(str(clf)[:7],10))
recall_rate=Series(recall_rate)
precision_rate=Series(precision_rate)
accuracy_rate=Series(accuracy_rate)
rate={"classfier":classifier_i,"model":index_i,"recall":recall_rate,"precision":precision_rate,"accuracy":accuracy_rate}
# result=pd.concat(rate,axis=1) --remove this line as well.
# I think you don't need the "result" variable.
# move this line inside and make a little change:
#cvResult=cvResult.append(result)
cvResult = pd.concat([cvResult, rate], ignore_index=True)
return(cvResult)
Can you please try this and let us know if it works? I think one of your problem is the way you use pd.concat(obj) , the obj should be a list of item or a dict of pd.Series.... but you didn't concat rate with anything else. and the use of variable "result" is unnecessary to me. but, again, I could be wrong.
Upvotes: 1