Reputation: 3640
Let's say I have a dataframe that looks like this:
interview longitude latitude
1 A1 34.2 90.2
2 A1 54.2 23.5
3 A3 32.1 21.5
4 A4 54.3 93.1
5 A2 45.1 29.5
6 A1 NaN NaN
7 A7 NaN NaN
8 A1 NaN NaN
9 A3 23.1 38.2
10 A5 -23.7 -98.4
I would like to be able to perform some sort of groupby method that outputs me the total present values within each subgroup. So, desired output for something like this would be:
interview longitude latitude occurs
1 A1 2 2 4
2 A2 1 1 1
3 A3 2 2 2
4 A4 1 1 1
5 A5 1 1 1
6 A7 0 0 1
I tried to use this command to try with latitudes, but not getting the desired output:
df.groupby(by=['interview', 'latitude'])['interview'].count()
Thanks!
Upvotes: 2
Views: 1477
Reputation: 3993
These three different ways how you can do it :
import pandas as pd
import numpy as np
data = np.array([
['A1', 'A1', 'A3' ,'A4' ,'A2' ,'A1' ,'A7', 'A1', 'A3', 'A5'],
[34.2, 54.2, 32.1, 54.3, 45.1, np.NaN ,np.NaN ,np.NaN, 23.1, -23.7],
[ 90.2, 23.5, 21.5, 93.1, 29.5, np.NaN,np.NaN ,np.NaN ,38.2, -98.4]])
df = pd.DataFrame({'**interview**':data[0,:],'**longitude**':data[1,:],'**latitude**':data[2,:]})
#first way
df['**occurs**']=1
print(df.groupby('**interview**')[['**longitude**','**latitude**','**occurs**']].count().\
reset_index())
#or
counts=0
gb = df.groupby(['**interview**'])
gb1 = df.groupby(['**interview**','**latitude**'])
counts = gb.size().to_frame(name='**occurs**')
print((counts
.join(gb1.agg({'**longitude**':lambda x: x.notnull().size}))
.join(gb1.agg({'**latitude**': lambda x: x.notnull().size}).rename(columns={'**latitude**': '*latitude*'}))
.reset_index()
))
#second way
counts=0
gb = df.groupby(['**interview**'])
counts = gb.size().to_frame(name='**occurs**')
print((counts
.join(gb.agg({'**longitude**': 'size'}))
.join(gb.agg({'**latitude**': 'size'}))
.reset_index()
))
#Third way this just for compare
print(df.groupby(['**interview**']).agg({'**longitude**':lambda x: x.notnull().sum(),
'**latitude**':lambda x: x.notnull().sum(),
'**interview**': 'size'})\
.rename(columns={'**interview**':'**occurs**'}))
See the code here:https://repl.it/repls/OldVirtuousBooleanvalue
Upvotes: 0
Reputation: 38415
No need to use agg, simply pass the columns to groupby. Count returns count of not null values
df.groupby('interview')[['interview','longitude','latitude']].count()
interview longitude latitude
interview
A1 4 2 2
A2 1 1 1
A3 2 2 2
A4 1 1 1
A5 1 1 1
A7 1 0 0
Upvotes: 2
Reputation: 323226
Using notna
before groupby
+ sum
s1=(df[['**longitude**','**latitude**']].notna()).groupby(df['**interview**']).sum()
s2=df.groupby(df['**interview**']).size()# note size will count the NaN value as well
pd.concat([s1,s2.to_frame('**occurs** ')],axis=1)
Out[115]:
**longitude** **latitude** **occurs**
**interview**
A1 2.0 2.0 4
A2 1.0 1.0 1
A3 2.0 2.0 2
A4 1.0 1.0 1
A5 1.0 1.0 1
A7 0.0 0.0 1
Upvotes: 2