Reputation: 5385
I have the following problem. I have a dataframe with several columns, one of those contains strings as values. I want to loop through this column, change those values and save the changed values in a new column.
The code I have written so far looks like this:
def get_classes(x):
for index, string in df['column'].iteritems():
listi = string.split(',')
Classes=[]
for value in listi:
count=listi.count(value)
if count >= 3:
Classes.append(value)
Unique=(',').join(sorted(list(set(Classes))))
df['NewColumn']=Unique
End.apply(get_classes)
It loops through the rows of df['column']
, splitting the string at each ,
(creating a list called listi) and creates an empty list
called classes.
It then counts each value in listi and appends it to Classes if it occures at least three times in the list. The finished list is then sorted
and set()
, so that all objects in the list are unique, and finally joined at comma to a string again. Then I want to append this unique list of value in a new column, at the same index position as the row value the changed value is derived from. As example:
df
column NewColumn
0 A,A,A,C A
1 C,B,C,C C
2 B,B,B,B B
My code seems to work fine when I do print Unique
instead of df['NewColumn']=Unique
, as it then prints all the transformed values. If I execute the code like in my example however, the NewColumn
of the dataframe is completely filled with the same value, which seems to correspond to the original value of the last row in the df. Can someone explain to me what the problem here is?
Upvotes: 2
Views: 3699
Reputation: 31161
You can use powerfull Counter
from Collections:
from collections import Counter
foo = lambda x: ','.join(sorted([k for k,v in Counter(x).iteritems() if v>=3]))
df['new'] = df['column'].str.split(',').map(foo)
#In [33]: df
#Out[33]:
# column NewColumn new
#0 A,A,A,C A A
#1 C,B,C,C C C
#2 B,B,B,B B B
Upvotes: 2