harry04
harry04

Reputation: 962

How can I count the frequency of repeated values in dataframe column?

I have a column in a dataframe that is

UC      WR
V001    A, B, C, nan, A, C, D
C001    nan, C, D, A, nan, A
C002    C, B, B, A, A, A
C003    A, C, A, C, B, nan

I'm not sure what I'm doing wrong, but I'm not able to get rid of the nans. From this column, I want to have a different column, or a dictionary that gives me the frequency count of the different values in WR.

UC     WR Count
V001  {A: 2, B:1, C:2, D:1}
C001  {A:2, C:1, D:1}
C002  {A:3, B:2, C:1}
C003  {A:2, B:1, C:2}

or a similar dictionary. Thanks! :)

Upvotes: 0

Views: 162

Answers (3)

Loochie
Loochie

Reputation: 2472

To get the values as dictionaries you may also try:

df['WR Count'] = df['WR'].apply(lambda x: dict(Counter(x.split(', ')))

Upvotes: 0

SpghttCd
SpghttCd

Reputation: 10860

At first ignoring the nan entries, my approach would be:

df['WR Count'] = df.WR.str.replace(' ', '').str.split(',').apply(Counter)

#                          WR                                    WR Count
# UC                                                                                                        
# V001  A, B, C, nan, A, C, D  {'A': 2, 'B': 1, 'C': 2, 'nan': 1, 'D': 1}                               
# C001   nan, C, D, A, nan, A          {'nan': 2, 'C': 1, 'D': 1, 'A': 2}                               
# C002       C, B, B, A, A, A                    {'C': 1, 'B': 2, 'A': 3}                           
# C003     A, C, A, C, B, nan          {'A': 2, 'C': 2, 'B': 1, 'nan': 1} 

Note that if you are sure that the separator is always ', ', then you can hardcode it, which leads to a shorter command:

df['WR Count'] = df.WR.str.split(', ').apply(Counter)

Upvotes: 3

BENY
BENY

Reputation: 323226

Just do not make the dict into the cell in pandas, which will make a lots of build-in pandas' nice function not work any more

df.set_index('UC').WR.\
 str.split(', ',expand=True).\
    stack().str.get_dummies().sum(level=0).drop('nan',1)
      A  B  C  D
UC              
V001  2  1  2  1
C001  2  0  1  1
C002  3  2  1  0
C003  2  1  2  0

Upvotes: 1

Related Questions