MYjx
MYjx

Reputation: 4407

use for loop to concat dataframe to a larger dataframe

my question is for every step of for loop, a new dataframe will be generated. I want to concat the data frames together to have a larger one but somehow my function will only return the last step of the result rather than the merged result

def crossV(clf,data,n):
    cvResult=pd.DataFrame()
    for i in range(n+2)[2:]:
        cvResult=pd.DataFrame()
        tt=array(tuple(x[1:i] for x in data))
        qq=array(tuple(x[0] for x in data))
        recall_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='recall')*100
        precision_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='precision')*100
        accuracy_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='accuracy')*100
        index_i=Series(np.repeat(i-1,10))
        classifier_i=Series(np.repeat(str(clf)[:7],10))
        recall_rate=Series(recall_rate)
        precision_rate=Series(precision_rate)
        accuracy_rate=Series(accuracy_rate)
        rate={"classfier":classifier_i,"model":index_i,"recall":recall_rate,"precision":precision_rate,"accuracy":accuracy_rate}
        result=pd.concat(rate,axis=1)
    cvResult=cvResult.append(result)
    return(cvResult)

Thanks!

Upvotes: 0

Views: 110

Answers (1)

fast tooth
fast tooth

Reputation: 2459

This might not be the right answer, it is more readable written as answer.

I think the right logic should be ( but I can be very wrong):

def crossV(clf,data,n):
    cvResult=pd.DataFrame() #create an empty DF here. 
    for i in range(n+2)[2:]:
        # cvResult=pd.DataFrame() -- remove this line. 
        tt=array(tuple(x[1:i] for x in data))
        qq=array(tuple(x[0] for x in data))
        recall_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='recall')*100
        precision_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='precision')*100
        accuracy_rate=cross_validation.cross_val_score(clf, tt, qq, cv=10,scoring='accuracy')*100
        index_i=Series(np.repeat(i-1,10))
        classifier_i=Series(np.repeat(str(clf)[:7],10))
        recall_rate=Series(recall_rate)
        precision_rate=Series(precision_rate)
        accuracy_rate=Series(accuracy_rate)
        rate={"classfier":classifier_i,"model":index_i,"recall":recall_rate,"precision":precision_rate,"accuracy":accuracy_rate}
        # result=pd.concat(rate,axis=1) --remove this line as well. 
        # I think you don't need the "result" variable. 

        # move this line inside and make a little change:
        #cvResult=cvResult.append(result) 
        cvResult = pd.concat([cvResult, rate], ignore_index=True)
    return(cvResult)

Can you please try this and let us know if it works? I think one of your problem is the way you use pd.concat(obj) , the obj should be a list of item or a dict of pd.Series.... but you didn't concat rate with anything else. and the use of variable "result" is unnecessary to me. but, again, I could be wrong.

Upvotes: 1

Related Questions