jarry jafery
jarry jafery

Reputation: 1036

pivot_table with group and without value field

I have pandas data frame url like

location  dom_category
3         'edu'
3         'gov'
3         'edu'
4         'org'
4         'others'
4         'org'

and i want this data frame to be like

location  edu   gov   org   others
3         2     1     0     0
4         0     0     2     1

the edu,gov,org and others contains the count for specific location. i have right the code but i know its not the optimized

url['val']=1
url_final=url.pivot_table(index=['location'],values='val',columns=
['dom_category'],aggfunc=np.sum)

Upvotes: 4

Views: 2969

Answers (3)

jezrael
jezrael

Reputation: 863256

First if necessary remove ' by str.strip.

Then use groupby with aggregating size and reshape by unstack:

df['dom_category'] = df['dom_category'].str.strip("\'")
df = df.groupby(['location','dom_category']).size().unstack(fill_value=0)
print (df)
dom_category  edu  gov  org  others
location                           
3               2    1    0       0
4               0    0    2       1

Or use pivot_table:

df['dom_category'] = df['dom_category'].str.strip("\'")
df=df.pivot_table(index='location',columns='dom_category',aggfunc='size', fill_value=0)
print (df)
dom_category  edu  gov  org  others
location                           
3               2    1    0       0
4               0    0    2       1

Last is possible convert index to column and remove columns name dom_category by reset_index + rename_axis:

df = df.reset_index().rename_axis(None, axis=1)
print (df)
   location  edu  gov  org  others
0         3    2    1    0       0
1         4    0    0    2       1

Upvotes: 9

piRSquared
piRSquared

Reputation: 294488

Using groupby and value_counts

House Keeping
get rid of '

df.dom_category = df.dom_category.str.strip("'")

Rest of Solution

df.groupby('location').dom_category.value_counts().unstack(fill_value=0)

dom_category  edu  gov  org  others
location                           
3               2    1    0       0
4               0    0    2       1

To get the formatting just right

df.groupby('location').dom_category.value_counts().unstack(fill_value=0) \ 
  .reset_index().rename_axis(None, 1)

   location  edu  gov  org  others
0         3    2    1    0       0
1         4    0    0    2       1

Upvotes: 3

Scott Boston
Scott Boston

Reputation: 153510

Let's use str.strip, get_dummies and groupby:

df['dom_category'] = df.dom_category.str.strip("\'")
df.assign(**df.dom_category.str.get_dummies()).groupby('location').sum().reset_index()

Output:

   location  edu  gov  org  others
0         3    2    1    0       0
1         4    0    0    2       1

Upvotes: 2

Related Questions