sequence_hard
sequence_hard

Reputation: 5385

How to loop through a dataframe, create a new column and append values to it in python

I have the following problem. I have a dataframe with several columns, one of those contains strings as values. I want to loop through this column, change those values and save the changed values in a new column.

The code I have written so far looks like this:

def get_classes(x):    
    for index, string in df['column'].iteritems():
        listi = string.split(',')
        Classes=[]

        for value in listi:
            count=listi.count(value)
            if count >= 3: 
                Classes.append(value)

        Unique=(',').join(sorted(list(set(Classes))))
        df['NewColumn']=Unique


End.apply(get_classes)

It loops through the rows of df['column'], splitting the string at each ,(creating a list called listi) and creates an empty list called classes. It then counts each value in listi and appends it to Classes if it occures at least three times in the list. The finished list is then sorted and set(), so that all objects in the list are unique, and finally joined at comma to a string again. Then I want to append this unique list of value in a new column, at the same index position as the row value the changed value is derived from. As example:

df
  column    NewColumn
0 A,A,A,C   A 
1 C,B,C,C   C
2 B,B,B,B   B

My code seems to work fine when I do print Unique instead of df['NewColumn']=Unique, as it then prints all the transformed values. If I execute the code like in my example however, the NewColumn of the dataframe is completely filled with the same value, which seems to correspond to the original value of the last row in the df. Can someone explain to me what the problem here is?

Upvotes: 2

Views: 3699

Answers (1)

Colonel Beauvel
Colonel Beauvel

Reputation: 31161

You can use powerfull Counter from Collections:

from collections import Counter

foo = lambda x: ','.join(sorted([k for k,v in Counter(x).iteritems() if v>=3]))

df['new'] = df['column'].str.split(',').map(foo)


#In [33]: df
#Out[33]:
#    column NewColumn new
#0  A,A,A,C         A   A
#1  C,B,C,C         C   C
#2  B,B,B,B         B   B

Upvotes: 2

Related Questions