Ricky
Ricky

Reputation: 315

grouping words inside pandas dataframe column by another column to get the frequency/count

I have a pandas dataframe that looks something like:

    location       skills
0   Washington     excel
1   Chicago     
2   West Lebanon   r excel
3   Midland     
4   Washington     sql java
5   Cincinnati     sql java scala python.
6   Dover          sas sql r spss
7   Dover          sas c++ spss

With a lot more rows of course. I want to get the frequency of every unique string inside the 'skills' column for every location.

So for example, for location Dover, i want the frequency of sas to be 2, spss: 2, r : 1, excel: 0, java: 0 and so forth..How would I go about doing this?

Upvotes: 1

Views: 266

Answers (1)

piRSquared
piRSquared

Reputation: 294228

use str.cat to combine column. Then split and use pd.value_counts

pd.value_counts(df.skills.str.cat(sep=' ').split())

sql        3
excel      2
java       2
spss       2
sas        2
r          2
scala      1
python.    1
c++        1
dtype: int64

To get this by location

f = lambda x: pd.value_counts(x.str.cat(sep=' ').split())
df.groupby('location').skills.apply(f).unstack(fill_value=0)

              c++  excel  java  python.  r  sas  scala  spss  sql
location                                                         
Cincinnati      0      0     1        1  0    0      1     0    1
Dover           1      0     0        0  1    2      0     2    1
Washington      0      1     1        0  0    0      0     0    1
West Lebanon    0      1     0        0  1    0      0     0    0

Upvotes: 2

Related Questions