Reputation: 315
I have a pandas dataframe that looks something like:
location skills
0 Washington excel
1 Chicago
2 West Lebanon r excel
3 Midland
4 Washington sql java
5 Cincinnati sql java scala python.
6 Dover sas sql r spss
7 Dover sas c++ spss
With a lot more rows of course. I want to get the frequency of every unique string inside the 'skills' column for every location.
So for example, for location Dover, i want the frequency of sas to be 2, spss: 2, r : 1, excel: 0, java: 0 and so forth..How would I go about doing this?
Upvotes: 1
Views: 266
Reputation: 294228
use str.cat
to combine column. Then split and use pd.value_counts
pd.value_counts(df.skills.str.cat(sep=' ').split())
sql 3
excel 2
java 2
spss 2
sas 2
r 2
scala 1
python. 1
c++ 1
dtype: int64
To get this by location
f = lambda x: pd.value_counts(x.str.cat(sep=' ').split())
df.groupby('location').skills.apply(f).unstack(fill_value=0)
c++ excel java python. r sas scala spss sql
location
Cincinnati 0 0 1 1 0 0 1 0 1
Dover 1 0 0 0 1 2 0 2 1
Washington 0 1 1 0 0 0 0 0 1
West Lebanon 0 1 0 0 1 0 0 0 0
Upvotes: 2