vojta
vojta

Reputation: 122

start - end time in pandas column

I have a following problem. I would like to compute time period when each id occurs in df.

data = {'id': [2, 2, 2, 4, 4], 'time': ['22:17:46', '22:19:02', '22:06:00', '22:18:06', '22:18:06']}
pd.DataFrame.from_dict(data)

In the example above the desired output would be 22:19:02 - 22:06:00 = 13.03 minutes for id 2. Similarly 0 for id 4. Finally, what I want is:

data = {'id': [2, 2, 4, 2, 4], 'time': ['22:17:46', '22:19:02', '22:18:06', '22:06:00', '22:18:06'], 'time_diff': [13.03, 13.03, 0, 13.03, 0]}

I tried this, but it is terribly slow on large data:

data["max_time"] = data.groupby(["ip_adresa"])["time"].transform("max").astype(str)
data["min_time"] = data.groupby(["ip_adresa"])["time"].transform("min").astype(str)

data["time_diff"] = ""

for i in range(0, len(data)):
    data["diff_time"][i] = (
        datetime.datetime.strptime(data["max_time"][i], format)
        - datetime.datetime.strptime(data["min_time"][i], format)
    ).total_seconds()

How can I do it please?

Upvotes: 2

Views: 252

Answers (1)

use the transform function. it iterates over every row returned by the groupby for id and finds the min and max values. cast the time as a datetime data type

data = {'id': [2, 2, 2, 4, 4], 'time': ['22:17:46', '22:19:02', '22:06:00', '22:18:06', '22:18:06']}
df=pd.DataFrame.from_dict(data)

df['time'] = pd.to_datetime(df['time'])
df['time_diff'] = df.groupby('id')['time'].transform(lambda x: x.max() - x.min())
print(df)

output

 id                time       time_diff
0   2 2021-11-16 22:17:46 0 days 00:13:02
1   2 2021-11-16 22:19:02 0 days 00:13:02
2   2 2021-11-16 22:06:00 0 days 00:13:02
3   4 2021-11-16 22:18:06 0 days 00:00:00
4   4 2021-11-16 22:18:06 0 days 00:00:00

Upvotes: 2

Related Questions