Reputation: 122
I have a following problem. I would like to compute time period when each id occurs in df.
data = {'id': [2, 2, 2, 4, 4], 'time': ['22:17:46', '22:19:02', '22:06:00', '22:18:06', '22:18:06']}
pd.DataFrame.from_dict(data)
In the example above the desired output would be 22:19:02
- 22:06:00
= 13.03
minutes for id 2. Similarly 0 for id 4. Finally, what I want is:
data = {'id': [2, 2, 4, 2, 4], 'time': ['22:17:46', '22:19:02', '22:18:06', '22:06:00', '22:18:06'], 'time_diff': [13.03, 13.03, 0, 13.03, 0]}
I tried this, but it is terribly slow on large data:
data["max_time"] = data.groupby(["ip_adresa"])["time"].transform("max").astype(str)
data["min_time"] = data.groupby(["ip_adresa"])["time"].transform("min").astype(str)
data["time_diff"] = ""
for i in range(0, len(data)):
data["diff_time"][i] = (
datetime.datetime.strptime(data["max_time"][i], format)
- datetime.datetime.strptime(data["min_time"][i], format)
).total_seconds()
How can I do it please?
Upvotes: 2
Views: 252
Reputation: 4233
use the transform function. it iterates over every row returned by the groupby for id and finds the min and max values. cast the time as a datetime data type
data = {'id': [2, 2, 2, 4, 4], 'time': ['22:17:46', '22:19:02', '22:06:00', '22:18:06', '22:18:06']}
df=pd.DataFrame.from_dict(data)
df['time'] = pd.to_datetime(df['time'])
df['time_diff'] = df.groupby('id')['time'].transform(lambda x: x.max() - x.min())
print(df)
output
id time time_diff
0 2 2021-11-16 22:17:46 0 days 00:13:02
1 2 2021-11-16 22:19:02 0 days 00:13:02
2 2 2021-11-16 22:06:00 0 days 00:13:02
3 4 2021-11-16 22:18:06 0 days 00:00:00
4 4 2021-11-16 22:18:06 0 days 00:00:00
Upvotes: 2