sgerbhctim
sgerbhctim

Reputation: 3640

Pandas GroupBy and total sum within group

Let's say I have a dataframe that looks like this:

    interview       longitude        latitude
1   A1                  34.2             90.2
2   A1                  54.2             23.5
3   A3                  32.1             21.5
4   A4                  54.3             93.1
5   A2                  45.1             29.5
6   A1                  NaN              NaN
7   A7                  NaN              NaN
8   A1                  NaN              NaN
9   A3                  23.1             38.2
10  A5                  -23.7            -98.4

I would like to be able to perform some sort of groupby method that outputs me the total present values within each subgroup. So, desired output for something like this would be:

    interview         longitude         latitude       occurs 
1   A1                  2                2              4
2   A2                  1                1              1
3   A3                  2                2              2
4   A4                  1                1              1
5   A5                  1                1              1    
6   A7                  0                0              1

I tried to use this command to try with latitudes, but not getting the desired output:

df.groupby(by=['interview', 'latitude'])['interview'].count()

Thanks!

Upvotes: 2

Views: 1477

Answers (3)

I_Al-thamary
I_Al-thamary

Reputation: 3993

These three different ways how you can do it :

 import pandas as pd
    import numpy as np

    data = np.array([   
            ['A1',  'A1',   'A3'    ,'A4'   ,'A2'   ,'A1'   ,'A7',  'A1',   'A3',   'A5'],      
            [34.2,  54.2,   32.1,   54.3,   45.1,   np.NaN  ,np.NaN ,np.NaN,    23.1,   -23.7],
            [   90.2,   23.5,   21.5,   93.1,   29.5,       np.NaN,np.NaN   ,np.NaN ,38.2,  -98.4]])


    df = pd.DataFrame({'**interview**':data[0,:],'**longitude**':data[1,:],'**latitude**':data[2,:]})  

    #first way
    df['**occurs**']=1
    print(df.groupby('**interview**')[['**longitude**','**latitude**','**occurs**']].count().\
    reset_index())
    #or
     counts=0

gb = df.groupby(['**interview**'])
gb1 = df.groupby(['**interview**','**latitude**'])
counts = gb.size().to_frame(name='**occurs**')

print((counts
   .join(gb1.agg({'**longitude**':lambda x: x.notnull().size}))
 .join(gb1.agg({'**latitude**': lambda x: x.notnull().size}).rename(columns={'**latitude**': '*latitude*'}))

   .reset_index()
  ))

     #second way
    counts=0

    gb = df.groupby(['**interview**'])
    counts = gb.size().to_frame(name='**occurs**')

    print((counts
       .join(gb.agg({'**longitude**': 'size'}))
     .join(gb.agg({'**latitude**': 'size'}))

       .reset_index()
      ))

     #Third way   this just for compare
    print(df.groupby(['**interview**']).agg({'**longitude**':lambda x: x.notnull().sum(),
                                       '**latitude**':lambda x: x.notnull().sum(),
                                       '**interview**': 'size'})\
                                 .rename(columns={'**interview**':'**occurs**'}))

See the code here:https://repl.it/repls/OldVirtuousBooleanvalue

Upvotes: 0

Vaishali
Vaishali

Reputation: 38415

No need to use agg, simply pass the columns to groupby. Count returns count of not null values

df.groupby('interview')[['interview','longitude','latitude']].count()


        interview   longitude   latitude
interview           
A1      4           2           2
A2      1           1           1
A3      2           2           2
A4      1           1           1
A5      1           1           1
A7      1           0           0

Upvotes: 2

BENY
BENY

Reputation: 323226

Using notna before groupby + sum

s1=(df[['**longitude**','**latitude**']].notna()).groupby(df['**interview**']).sum()
s2=df.groupby(df['**interview**']).size()# note size will count the NaN value as well 
pd.concat([s1,s2.to_frame('**occurs** ')],axis=1)
Out[115]: 
               **longitude**  **latitude**  **occurs** 
**interview**                                          
A1                       2.0           2.0            4
A2                       1.0           1.0            1
A3                       2.0           2.0            2
A4                       1.0           1.0            1
A5                       1.0           1.0            1
A7                       0.0           0.0            1

Upvotes: 2

Related Questions