Reputation: 1036
I have pandas data frame url like
location dom_category
3 'edu'
3 'gov'
3 'edu'
4 'org'
4 'others'
4 'org'
and i want this data frame to be like
location edu gov org others
3 2 1 0 0
4 0 0 2 1
the edu,gov,org and others contains the count for specific location. i have right the code but i know its not the optimized
url['val']=1
url_final=url.pivot_table(index=['location'],values='val',columns=
['dom_category'],aggfunc=np.sum)
Upvotes: 4
Views: 2969
Reputation: 863256
First if necessary remove '
by str.strip
.
Then use groupby
with aggregating size
and reshape by unstack
:
df['dom_category'] = df['dom_category'].str.strip("\'")
df = df.groupby(['location','dom_category']).size().unstack(fill_value=0)
print (df)
dom_category edu gov org others
location
3 2 1 0 0
4 0 0 2 1
Or use pivot_table
:
df['dom_category'] = df['dom_category'].str.strip("\'")
df=df.pivot_table(index='location',columns='dom_category',aggfunc='size', fill_value=0)
print (df)
dom_category edu gov org others
location
3 2 1 0 0
4 0 0 2 1
Last is possible convert index to column and remove columns name dom_category
by reset_index
+ rename_axis
:
df = df.reset_index().rename_axis(None, axis=1)
print (df)
location edu gov org others
0 3 2 1 0 0
1 4 0 0 2 1
Upvotes: 9
Reputation: 294488
Using groupby
and value_counts
House Keeping
get rid of '
df.dom_category = df.dom_category.str.strip("'")
Rest of Solution
df.groupby('location').dom_category.value_counts().unstack(fill_value=0)
dom_category edu gov org others
location
3 2 1 0 0
4 0 0 2 1
To get the formatting just right
df.groupby('location').dom_category.value_counts().unstack(fill_value=0) \
.reset_index().rename_axis(None, 1)
location edu gov org others
0 3 2 1 0 0
1 4 0 0 2 1
Upvotes: 3
Reputation: 153510
Let's use str.strip
, get_dummies
and groupby
:
df['dom_category'] = df.dom_category.str.strip("\'")
df.assign(**df.dom_category.str.get_dummies()).groupby('location').sum().reset_index()
Output:
location edu gov org others
0 3 2 1 0 0
1 4 0 0 2 1
Upvotes: 2