Reputation: 368
I have my dataframe -
import pandas as pd
data = [['2233', 'A', 'FY21'], ['441', 'C', 'FY20'], ['6676', 'A', 'FY19'], ['033', 'C', 'FY16'],
['12', 'A', 'FY18'], ['91', 'B', 'FY15'], ['6676', 'C', 'FY10'], ['441', 'C', 'FY17'],
['12', 'A', 'FY14'], ['441', 'C', 'FY12']]
df = pd.DataFrame(data, columns = ('emp_id', 'category', 'year'))
df
emp_id category year
0 2233 A FY21
1 441 C FY20
2 6676 A FY19
3 033 C FY16
4 12 A FY18
5 91 B FY15
6 6676 C FY10
7 441 C FY17
8 12 A FY14
9 441 C FY12
So basically I want the categories should be created as individual column i.e A, B, & C and each column should contain the counts of them.
What I want as my output -
emp_id A B C
0 2233 1
1 441 3
2 6676 1
3 033 1
4 12 2
5 91 1
6 6676 1
What I was trying -
df['count'] = df.groupby(['emp_id'])['category'].transform('count')
df.drop_duplicates('emp_id', inplace = True)
df
emp_id category year count
0 2233 A FY21 1
1 441 C FY20 3
2 6676 A FY19 2
3 033 C FY16 1
4 12 A FY18 2
5 91 B FY15 1
please help me to get my desired output in python.
Upvotes: 2
Views: 590
Reputation: 10624
You can also use pivot table:
pv = pd.pivot_table(df, index='emp_id', columns='category', aggfunc='count')
pv.fillna('', inplace=True)
print(pv)
year
category A B C
emp_id
033 1
12 2
2233 1
441 3
6676 1 1
91 1
Upvotes: 1
Reputation: 14949
Use pd.crosstab:
df1 = pd.crosstab(df['emp_id'], df['category']).rename_axis(
columns=None).reset_index()
emp_id A B C
0 033 0 0 1
1 12 2 0 0
2 2233 1 0 0
3 441 0 0 3
4 6676 1 0 1
5 91 0 1 0
If you don't need 0
in the output you can use:
df = pd.crosstab(df['emp_id'], df['category']).rename_axis(
columns=None).reset_index().replace(0, '')
emp_id A B C
0 033 1
1 12 2
2 2233 1
3 441 3
4 6676 1 1
5 91 1
df = (
df.reset_index()
.pivot_table(
index=['emp_id', df.groupby('emp_id')['year'].transform(', '.join)],
columns='category',
values='index',
aggfunc='count',
fill_value=0)
.rename_axis(columns=None)
.reset_index()
)
emp_id year A B C
0 033 FY16 0 0 1
1 12 FY18, FY14 2 0 0
2 2233 FY21 1 0 0
3 441 FY20, FY17, FY12 0 0 3
4 6676 FY19, FY10 1 0 1
5 91 FY15 0 1 0
Upvotes: 3