Reputation: 6290
I have a Pandas DataFrame which looks as follows:
ID1 ID2 timestamp x y
0 0 43 1 40
0 0 53 20 41
0 0 63 21 41
0 1 73 5 100
0 1 75 6 99
0 1 83 7 87
1 0 100 34 23
1 0 200 0 0
1 0 210 0 22
1 0 222 22 15
2 0 300 22 15
2 1 450 22 15
2 1 451 22 15
Now, for each group of ID1 and ID2 I would like to calculate the time difference (max - min timestamp) and the mean of x and y. The resulting dataframe should look as follows:
ID1 ID2 timestamp x y
0 0 20 14 40.6
0 1 10 6 95.3
1 0 110 14 15
2 0 300 22 15
2 1 1 22 15
How can this be done? I could use df.groupby(["ID1", "ID2"])
to group but I don't know how to calculate the difference and mean in situ.
Upvotes: 3
Views: 355
Reputation: 323226
I will using np.ptp
df.groupby(['ID1','ID2'],as_index=False).agg({'timestamp':np.ptp,'x':'mean','y':'mean'})
ID1 ID2 timestamp x y
0 0 0 20 14 40.666667
1 0 1 10 6 95.333333
2 1 0 122 14 15.000000
3 2 0 0 22 15.000000
4 2 1 1 22 15.000000
Upvotes: 1
Reputation: 1488
As you didn't post what you tried, I guess a read at this piece of documentation may be of great help to your question.
At the beginning of that page, the split-apply-combine method is explained.
Hope this helps.
Upvotes: -1
Reputation: 8097
Should be something like:
df.groupby(["ID1", "ID2"]).agg({
'timestamp': lambda x: x.max() - x.min(),
'x': 'mean',
'y': 'mean'
}).reset_index()
Upvotes: 3