dkk
dkk

Reputation: 171

How to do subtraction of fields in the same column - Python

I have the following initial dataframe :

enter image description here

Country      Gender     Q1
Australia    M          Increase
Australia    F          Increase
Australia    M          Increase
Australia    M          Decrease
India        M          Decrease
India        F          Increase
India        M          Decrease
South Africa F          Decrease
UK           F          Increase

Here we are counting Q1 for each filter :

enter image description here

Country       Gender      Q1         Count(Q1)
Australia       M         Increase    2
Australia       F         Increase    1
Australia       M         Decrease    1
India           M         Decrease    2
India           F         Increase    1
South Africa    F         Decrease    1
UK              F         Increase    1

Now for the same filter , I need to subtract the count(increase) with count(decrease)ans store in a new option in Q1 "Net". Thus Net = count(increase) - count(decrease). The final dataframe looks like that :

enter image description here

Is this feasible in Python ?

Regards,

Upvotes: 1

Views: 148

Answers (2)

braintho
braintho

Reputation: 401

Exactly what you wanted

df = pd.DataFrame({
    'Country':['Australia','Australia','Australia','Australia',
              'India','India','India',
              'South Africa','UK'],
    'Gender':['M','F','M','M',
             'M','F','M',
             'F','F'],
    'Q1':['Increase','Increase','Increase','Decrease',
          'Decrease','Increase','Decrease',
          'Decrease','Increase']
})
df1 = pd.DataFrame(df.groupby(['Country','Gender','Q1'])['Q1']\
                   .count()).rename(columns={'Q1':'CountQ1'}).reset_index()
df2 =pd.pivot_table(df,index = ['Country','Gender'],
                  columns = 'Q1',
                  values = 'Q1',
                  aggfunc = lambda x: len(x),
                  margins_name = 'Q1').fillna(0).reset_index().copy()

df2 = df2.assign(CountQ1 = df2.Increase - df2.Decrease,
           Q1='Net',
           Net = df2.apply(lambda x: f'({int(x.Increase)}-{int(x.Decrease)})',axis=1))\
.drop(['Increase','Decrease'],axis=1)
df3 = pd.concat([df1,df2]).sort_values(['Country','Gender'],ascending=(1,0))[['Country','Gender','Q1','CountQ1','Net']]
df3

Output

        Country Gender        Q1  CountQ1    Net
1     Australia      M  Decrease      1.0    NaN
2     Australia      M  Increase      2.0    NaN
1     Australia      M       Net      1.0  (2-1)
0     Australia      F  Increase      1.0    NaN
0     Australia      F       Net      1.0  (1-0)
4         India      M  Decrease      2.0    NaN
3         India      M       Net     -2.0  (0-2)
3         India      F  Increase      1.0    NaN
2         India      F       Net      1.0  (1-0)
5  South Africa      F  Decrease      1.0    NaN
4  South Africa      F       Net     -1.0  (0-1)
6            UK      F  Increase      1.0    NaN
5            UK      F       Net      1.0  (1-0)

Upvotes: 1

Nev1111
Nev1111

Reputation: 1049

Add a default_count column to the initial dataframe and then use pivot_table

   df=pd.DataFrame({'country':['Australia','Australia','Australia','Australia','India','India','India','South Africa','UK'],'gender':['M','F','M','M','M','F','M','F','F'],'Q1':['Increase','Increase','Increase','Decrease','Decrease','Increase','Decrease','Decrease','Increase'],'default_count':['1','1','1','1','1','1','1','1','1']})

   df_pivot=df.pivot_table(index=['country','gender'],columns='Q1',aggfunc='count',values='default_count').reset_index().fillna(0)

   df_pivot['Net']=df_pivot['Increase']-df_pivot['Decrease']

Upvotes: 2

Related Questions