Reputation: 71
I have data which looks like this, I got some type and some timestamps, that I want to subtract from each other.
import numpy as np
import pandas as pd
data = [["a",12],["a",13],["a",15],["b",32],["b",34],["b",37]]
df = pd.DataFrame(data)
df.columns = ['type', 'time']
df["diff"] = df.groupby("type")["time"].diff()
df
type time diff
0 a 12 NaN
1 a 13 1.0
2 a 15 2.0
3 b 32 NaN
4 b 34 2.0
5 b 37 3.0
But others than the default, I want to compare every timestamp (1,2;4,5) to the first timestamp of the type series, so the diff of line 2 and 5 should be 3.0 and 5.0. How could I solve this? Thanks!
Upvotes: 1
Views: 242
Reputation: 1116
I think using .cumsum()
would suffice.
import numpy as np
import pandas as pd
data = [["a",12],["a",13],["a",15],["b",32],["b",34],["b",37]]
df = pd.DataFrame(data)
df.columns = ['type', 'time']
df["diff"] = df.groupby("type")["time"].diff().fillna(0)
df["diff"] = df.groupby("type")["diff"].cumsum()
print(df)
>>>
type time diff
0 a 12 0.0
1 a 13 1.0
2 a 15 3.0
3 b 32 0.0
4 b 34 2.0
5 b 37 5.0
BTW, this code also works:
import numpy as np
import pandas as pd
data = [["a",12],["a",13],["a",15],["b",32],["b",34],["b",37]]
df = pd.DataFrame(data)
df.columns = ['type', 'time']
df["diff"] = df['time'] - df.groupby("type")["time"].transform('first')
print(df)
Upvotes: 1