Reputation: 962
I have a column in a dataframe that is
UC WR
V001 A, B, C, nan, A, C, D
C001 nan, C, D, A, nan, A
C002 C, B, B, A, A, A
C003 A, C, A, C, B, nan
I'm not sure what I'm doing wrong, but I'm not able to get rid of the nan
s. From this column, I want to have a different column, or a dictionary that gives me the frequency count of the different values in WR
.
UC WR Count
V001 {A: 2, B:1, C:2, D:1}
C001 {A:2, C:1, D:1}
C002 {A:3, B:2, C:1}
C003 {A:2, B:1, C:2}
or a similar dictionary. Thanks! :)
Upvotes: 0
Views: 162
Reputation: 2472
To get the values as dictionaries you may also try:
df['WR Count'] = df['WR'].apply(lambda x: dict(Counter(x.split(', ')))
Upvotes: 0
Reputation: 10860
At first ignoring the nan
entries, my approach would be:
df['WR Count'] = df.WR.str.replace(' ', '').str.split(',').apply(Counter)
# WR WR Count
# UC
# V001 A, B, C, nan, A, C, D {'A': 2, 'B': 1, 'C': 2, 'nan': 1, 'D': 1}
# C001 nan, C, D, A, nan, A {'nan': 2, 'C': 1, 'D': 1, 'A': 2}
# C002 C, B, B, A, A, A {'C': 1, 'B': 2, 'A': 3}
# C003 A, C, A, C, B, nan {'A': 2, 'C': 2, 'B': 1, 'nan': 1}
Note that if you are sure that the separator is always ', '
, then you can hardcode it, which leads to a shorter command:
df['WR Count'] = df.WR.str.split(', ').apply(Counter)
Upvotes: 3
Reputation: 323226
Just do not make the dict
into the cell in pandas
, which will make a lots of build-in pandas
' nice function not work any more
df.set_index('UC').WR.\
str.split(', ',expand=True).\
stack().str.get_dummies().sum(level=0).drop('nan',1)
A B C D
UC
V001 2 1 2 1
C001 2 0 1 1
C002 3 2 1 0
C003 2 1 2 0
Upvotes: 1