How to use numpy diff for non out[i] = a[i+1] - a[i] differences?

Question

I have data which looks like this, I got some type and some timestamps, that I want to subtract from each other.

import numpy as np
import pandas as pd
data = [["a",12],["a",13],["a",15],["b",32],["b",34],["b",37]]
df = pd.DataFrame(data)
df.columns = ['type', 'time']
df["diff"] = df.groupby("type")["time"].diff()
df

  type time diff
0  a  12 NaN
1  a  13 1.0
2  a  15 2.0
3  b  32 NaN
4  b  34 2.0
5  b  37 3.0

But others than the default, I want to compare every timestamp (1,2;4,5) to the first timestamp of the type series, so the diff of line 2 and 5 should be 3.0 and 5.0. How could I solve this? Thanks!

armamut · Accepted Answer

I think using .cumsum() would suffice.

import numpy as np
import pandas as pd
data = [["a",12],["a",13],["a",15],["b",32],["b",34],["b",37]]
df = pd.DataFrame(data)
df.columns = ['type', 'time']
df["diff"] = df.groupby("type")["time"].diff().fillna(0)
df["diff"] = df.groupby("type")["diff"].cumsum()
print(df)
>>>
  type  time  diff
0    a    12   0.0
1    a    13   1.0
2    a    15   3.0
3    b    32   0.0
4    b    34   2.0
5    b    37   5.0

BTW, this code also works:

import numpy as np
import pandas as pd
data = [["a",12],["a",13],["a",15],["b",32],["b",34],["b",37]]
df = pd.DataFrame(data)
df.columns = ['type', 'time']
df["diff"] = df['time'] - df.groupby("type")["time"].transform('first')
print(df)

How to use numpy diff for non out[i] = a[i+1] - a[i] differences?

Answers (1)

Related Questions