Reputation: 91
I'm trying to group items by ID then count the number of non-zero values by ID and assign that value to a new column. I also need to sum those PMN values by ID then drop duplicate ID values. For example:
d = {'ID': [156, 156, 156, 156, 157, 157, 157, 157],'PMN': [8.6, 8.4, 0, 10, 20, 0, 0, 5]}
df = pd.DataFrame(data=d)
I thought this might work for count and sum:
df['NodeDegree'] = df.groupby(['ID'])['PMN'].count()
df['UpdateFlow'] = df.groupby(['ID'])['PMN'].sum()
df.drop_duplicates(subset=['ID'], keep='last', inplace=True)
But it just return NaN for the NodeDegree values, and i'm not incorporating the !=0. Also, is it possible to make these changes in place?
Pointers please :)
Thanks!
Upvotes: 1
Views: 506
Reputation: 195438
I hope I've understood you correctly:
x = df.groupby("ID")["PMN"].agg(
NodeDegree=lambda x: x[x != 0].count(), UpdateFlow="sum"
)
print(x)
Prints:
NodeDegree UpdateFlow
ID
156 3 27.0
157 2 25.0
Upvotes: 1