Reputation: 171
I have the following initial dataframe :
Country Gender Q1
Australia M Increase
Australia F Increase
Australia M Increase
Australia M Decrease
India M Decrease
India F Increase
India M Decrease
South Africa F Decrease
UK F Increase
Here we are counting Q1 for each filter :
Country Gender Q1 Count(Q1)
Australia M Increase 2
Australia F Increase 1
Australia M Decrease 1
India M Decrease 2
India F Increase 1
South Africa F Decrease 1
UK F Increase 1
Now for the same filter , I need to subtract the count(increase) with count(decrease)ans store in a new option in Q1 "Net". Thus Net = count(increase) - count(decrease). The final dataframe looks like that :
Is this feasible in Python ?
Regards,
Upvotes: 1
Views: 148
Reputation: 401
Exactly what you wanted
df = pd.DataFrame({
'Country':['Australia','Australia','Australia','Australia',
'India','India','India',
'South Africa','UK'],
'Gender':['M','F','M','M',
'M','F','M',
'F','F'],
'Q1':['Increase','Increase','Increase','Decrease',
'Decrease','Increase','Decrease',
'Decrease','Increase']
})
df1 = pd.DataFrame(df.groupby(['Country','Gender','Q1'])['Q1']\
.count()).rename(columns={'Q1':'CountQ1'}).reset_index()
df2 =pd.pivot_table(df,index = ['Country','Gender'],
columns = 'Q1',
values = 'Q1',
aggfunc = lambda x: len(x),
margins_name = 'Q1').fillna(0).reset_index().copy()
df2 = df2.assign(CountQ1 = df2.Increase - df2.Decrease,
Q1='Net',
Net = df2.apply(lambda x: f'({int(x.Increase)}-{int(x.Decrease)})',axis=1))\
.drop(['Increase','Decrease'],axis=1)
df3 = pd.concat([df1,df2]).sort_values(['Country','Gender'],ascending=(1,0))[['Country','Gender','Q1','CountQ1','Net']]
df3
Output
Country Gender Q1 CountQ1 Net
1 Australia M Decrease 1.0 NaN
2 Australia M Increase 2.0 NaN
1 Australia M Net 1.0 (2-1)
0 Australia F Increase 1.0 NaN
0 Australia F Net 1.0 (1-0)
4 India M Decrease 2.0 NaN
3 India M Net -2.0 (0-2)
3 India F Increase 1.0 NaN
2 India F Net 1.0 (1-0)
5 South Africa F Decrease 1.0 NaN
4 South Africa F Net -1.0 (0-1)
6 UK F Increase 1.0 NaN
5 UK F Net 1.0 (1-0)
Upvotes: 1
Reputation: 1049
Add a default_count column to the initial dataframe and then use pivot_table
df=pd.DataFrame({'country':['Australia','Australia','Australia','Australia','India','India','India','South Africa','UK'],'gender':['M','F','M','M','M','F','M','F','F'],'Q1':['Increase','Increase','Increase','Decrease','Decrease','Increase','Decrease','Decrease','Increase'],'default_count':['1','1','1','1','1','1','1','1','1']})
df_pivot=df.pivot_table(index=['country','gender'],columns='Q1',aggfunc='count',values='default_count').reset_index().fillna(0)
df_pivot['Net']=df_pivot['Increase']-df_pivot['Decrease']
Upvotes: 2