Reputation: 320
I would like to calculate the max difference between n categories. Please note that the timestamp will differ in their milliseconds, but I only want to calculate the difference rowwise within a set time interval.
My data looks like this:
TimeStamp Category Value
2020-12-07 07:44:08.679 A 7.40
2020-12-07 07:44:08.682 B 8.18
2020-12-07 07:44:08.685 C 6.95
2020-12-07 07:44:11.271 A 249.30
2020-12-07 07:44:11.272 B 249.75
2020-12-07 07:44:11.273 C 250.58
2020-12-07 07:44:13.746 A 250.18
2020-12-07 07:44:13.747 B 250.78
2020-12-07 07:44:13.748 C 251.33
2020-12-07 07:44:16.225 A 249.87
2020-12-07 07:44:16.226 B 251.48
2020-12-07 07:44:16.227 C 251.43
And I want to achieve something like this:
TimeStamp Difference
2020-12-07 07:44:08 1,23
2020-12-07 07:44:11 1,28
2020-12-07 07:44:13 1,15
2020-12-07 07:44:16 1,61
But I am stuck where I could truncate or group the TimeStamps to their precise seconds.
Upvotes: 0
Views: 47
Reputation:
I recreated your data frame with this:
df = pd.read_csv(io.StringIO(
"""
TimeStamp,Category,Value
2020-12-07 07:44:08.679, A, 7.40
2020-12-07 07:44:08.682, B, 8.18
2020-12-07 07:44:08.685, C, 6.95
2020-12-07 07:44:11.271, A, 249.30
2020-12-07 07:44:11.272, B, 249.75
2020-12-07 07:44:11.273, C, 250.58
2020-12-07 07:44:13.746, A, 250.18
2020-12-07 07:44:13.747, B, 250.78
2020-12-07 07:44:13.748, C, 251.33
2020-12-07 07:44:16.225, A, 249.87
2020-12-07 07:44:16.226, B, 251.48
2020-12-07 07:44:16.227, C, 251.43
"""
), sep=",")
df.TimeStamp = pd.to_datetime(df.TimeStamp)
I made some assumptions before tackling the task. First, I assume you don't actually use the Category
variable so I drop that column.
df = df.drop("Category", axis=1)
``
I also remove the micro seconds as we are going to group on the seconds level.
```python
df.TimeStamp = df.TimeStamp.apply(lambda x: x.replace(microsecond=0))
Then I use the groupby
function on the TimeStamp
column and use an aggregate function which subtracts the minimum value in the group by the maximum value as such:
df = df.groupby(df["TimeStamp"]).agg(lambda x: max(x) - min(x))
And we end up with this data frame:
Value
TimeStamp
2020-12-07 07:44:08 1.23
2020-12-07 07:44:11 1.28
2020-12-07 07:44:13 1.15
2020-12-07 07:44:16 1.61
Upvotes: 1