machinery
machinery

Reputation: 6290

Transform each group in a DataFrame

I have the following DataFrame:

  id   x    y        timestamp  sensorTime
   1   32   30        1031      2002
   1   4    105       1035      2005
   1   8    110       1050      2006
   2   18   10        1500      3600
   2   40   20        1550      3610
   2   80   10        1450      3620
....


import pandas as pd
import numpy as np
df = pd.DataFrame(np.array([[1,1,1,2,2,2], [32,4,8,18,40,80], [30,105,110,10,20,10], [1031,1035,1050,1500,1550,1450], [2002, 2005, 2006, 3600, 3610, 3620]])).T
df.columns = ['id', 'x', 'y', 'timestamp', 'sensorTime]

For each group grouped by id I would like to add the differences of the sensorTime to the first value of timestamp. Something like the following:

start = df.iloc[0]['timestamp']
df['sensorTime'] -= df.iloc[0]['sensorTime']
df['sensorTime'] += start

But I would like to do this for each id group separately.

The resulting DataFrame should be:

  id   x    y        timestamp  sensorTime
   1   32   30        1031      1031
   1   4    105       1035      1034
   1   8    110       1050      1035
   2   18   10        1500      1500
   2   40   20        1550      1510
   2   80   10        1450      1520
....

How can this operation done per group?

Upvotes: 2

Views: 40

Answers (2)

sammywemmy
sammywemmy

Reputation: 28644

You could run a groupby twice, first, to get the difference in sensorTime, the second time to do the cumulative sum:

box = df.groupby("id").sensorTime.transform("diff")
df.assign(
    new_sensorTime=np.where(box.isna(), df.timestamp, box),
    new=lambda x: x.groupby("id")["new_sensorTime"].cumsum(),
).drop(columns="new_sensorTime")


    id  x   y   timestamp   sensorTime  new
0   1   32  30      1031    2002    1031.0
1   1   4   105     1035    2005    1034.0
2   1   8   110     1050    2006    1035.0
3   2   18  10      1500    3600    1500.0
4   2   40  20      1550    3610    1510.0
5   2   80  10     1450     3620    1520.0

Upvotes: 1

ggaurav
ggaurav

Reputation: 1804

df

    id  x   y timestamp sensorTime
0   1   32  30  1031    2002
1   1   4   105 1035    2005
2   1   8   110 1050    2006
3   2   18  10  1500    3600
4   2   40  20  1550    3610
5   2   80  10  1450    3620

You can group by id and then pass both timestamp and sensorTime. Then you can use diff to get the difference of sensorTime. The first value would be NaN and you can replace it with the first value of timestamp of that group. Then you can simply do cumsum to get the desired output.

def func(x):
    diff = x['sensorTime'].diff()
    diff.iloc[0] = x['timestamp'].iloc[0]
    return (diff.cumsum().to_frame())

df['sensorTime'] = df.groupby('id')[['timestamp', 'sensorTime']].apply(func)
df
    id  x   y timestamp sensorTime
0   1   32  30  1031    1031.0
1   1   4   105 1035    1034.0
2   1   8   110 1050    1035.0
3   2   18  10  1500    1500.0
4   2   40  20  1550    1510.0
5   2   80  10  1450    1520.0

Upvotes: 1

Related Questions