Reputation: 35
I am new to Python, so its probable that I am just not wording this properly to find the answer.
Using Pandas I was able to find the most frequent N words for every record in the description field of my data. However, I have two columns; a categorical column and the description field. How to I find the most common word per category?
Ex Data:
- Property|Description
- House| Blue, Two stories, pool
- Car| Green, Dented, Manual, New
- Car| Blue, Automatic, Heated Seat
- House|New, Furnished, HOA
- Car|Blue, Old, Multiple Owners
My current code will return Blue=3, New=2 etc. But what I need to know is that Blue appeared for the word Car twice and once for House.
Current relevant code
words = (data.Description.str.lower().str.cat(sep=' ').split())
keywords=pandas.DataFrame(Counter(words).most_common(10), columns=['Words', 'Frequency'])
Upvotes: 3
Views: 1514
Reputation: 26676
Data
df=pd.DataFrame({'Property':['House','Car','Car','House','Car'],'Description':['Blue,Two stories,pool','Green,Dented,Manual,New','Blue,Automatic,Heated Seat','Blue,Furnished,HOA','Blue,Old,Multiple Owners']})
Chained solution df.assign(words=df.Description.str.lower().str.split(',')).explode('words').groupby('Property')['words'].value_counts()
Explanation with a breakdown
#Create list
df['words'] = df.Description.str.lower().str.split(',')
#Explode and count
df=df.explode('words').groupby('Property')['words'].value_counts()
Property words
Car blue 2
automatic 1
dented 1
green 1
heated seat 1
manual 1
multiple owners 1
new 1
old 1
House blue 2
furnished 1
hoa 1
pool 1
two stories 1
Name: words, dtype: int64
Upvotes: 1
Reputation: 8302
Try this, split the row values by delimeter then apply explode to transform each element of a list-like to a row, finally Groupby
# remove leading white space's & split by delimiter
df['Description'] = df['Description'].str.strip()\
.str.replace(",\s+", ",")\
.str.split(',')
# apply group by to get count of each word.
print(df.explode(column='Description').
groupby(["Property","Description"]).size().reset_index(name='count'))
ouptut,
Property Description count
0 Car Automatic 1
1 Car Blue 2
2 Car Dented 1
3 Car Green 1
4 Car Heated Seat 1
...
Upvotes: 1
Reputation: 422
use groupby before counting: doc. and then you can do counting on each group
df = pd.DataFrame(...)
groups = df.groupby(['column_name'])
for group in groups:
do_counting()
Upvotes: 0