KReEd
KReEd

Reputation: 368

Create Pivot Table and get Count in Pandas Dataframe

I have my dataframe -

import pandas as pd
data = [['2233', 'A', 'FY21'], ['441', 'C', 'FY20'], ['6676', 'A', 'FY19'], ['033', 'C', 'FY16'], 
       ['12', 'A', 'FY18'], ['91', 'B', 'FY15'], ['6676', 'C', 'FY10'], ['441', 'C', 'FY17'], 
       ['12', 'A', 'FY14'], ['441', 'C', 'FY12']]
df = pd.DataFrame(data, columns = ('emp_id', 'category', 'year'))
df

emp_id   category   year
0   2233    A       FY21
1   441     C       FY20
2   6676    A       FY19
3   033     C       FY16
4   12      A       FY18
5   91      B       FY15
6   6676    C       FY10
7   441     C       FY17
8   12      A       FY14
9   441     C       FY12

So basically I want the categories should be created as individual column i.e A, B, & C and each column should contain the counts of them.

What I want as my output -

emp_id    A   B   C
0   2233  1     
1   441           3
2   6676  1     
3   033           1
4   12    2     
5   91        1 
6   6676          1

What I was trying -

df['count'] = df.groupby(['emp_id'])['category'].transform('count')
df.drop_duplicates('emp_id', inplace = True)
df
emp_id  category year   count
0   2233    A    FY21   1
1   441     C    FY20   3
2   6676    A    FY19   2
3   033     C    FY16   1
4   12      A    FY18   2
5   91      B    FY15   1

please help me to get my desired output in python.

Upvotes: 2

Views: 590

Answers (2)

IoaTzimas
IoaTzimas

Reputation: 10624

You can also use pivot table:

pv = pd.pivot_table(df, index='emp_id', columns='category', aggfunc='count')
pv.fillna('', inplace=True)

print(pv)

         year
category    A  B  C
emp_id
033               1
12          2
2233        1
441               3
6676        1     1
91             1

Upvotes: 1

Nk03
Nk03

Reputation: 14949

Use pd.crosstab:

df1 = pd.crosstab(df['emp_id'], df['category']).rename_axis(
    columns=None).reset_index()
OUTPUT:
  emp_id  A  B  C
0    033  0  0  1
1     12  2  0  0
2   2233  1  0  0
3    441  0  0  3
4   6676  1  0  1
5     91  0  1  0
NOTE:

If you don't need 0 in the output you can use:

df = pd.crosstab(df['emp_id'],  df['category']).rename_axis(
    columns=None).reset_index().replace(0, '')
OUTPUT:
  emp_id  A  B  C
0    033        1
1     12  2      
2   2233  1      
3    441        3
4   6676  1     1
5     91     1   
Updated Answer:
df = (
    df.reset_index()
    .pivot_table(
        index=['emp_id', df.groupby('emp_id')['year'].transform(', '.join)],
        columns='category',
        values='index',
        aggfunc='count',
        fill_value=0)
    .rename_axis(columns=None)
    .reset_index()
)
OUTPUT:
  emp_id              year  A  B  C
0    033              FY16  0  0  1
1     12        FY18, FY14  2  0  0
2   2233              FY21  1  0  0
3    441  FY20, FY17, FY12  0  0  3
4   6676        FY19, FY10  1  0  1
5     91              FY15  0  1  0

Upvotes: 3

Related Questions