Reputation: 3993
I have a time series dataframe
where the logs of a particular may runs from several minutes to hours. For visualisation purpose, I have to slice the logs of each id to the first 15-minutes data.
Take the example of the df below:
df = pd.DataFrame(
{'id': [15,15,15,15,15,64,64,64,64,64],
'timestamp': ['2016-04-01 00:05:00','2016-04-01 00:10:20','2016-04-01 00:13:01',
'2016-04-01 00:14:00','2016-04-01 00:16:00','2016-04-01 21:04:59',
'2016-04-01 21:13:05','2016-04-01 21:20:00','2016-04-01 21:25:25',
'2016-04-01 21:59:59']}
)
df
id timestamp
0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
4 15 2016-04-01 00:16:00
5 64 2016-04-01 21:04:59
6 64 2016-04-01 21:13:05
7 64 2016-04-01 21:20:00
8 64 2016-04-01 21:25:25
9 64 2016-04-01 21:59:59
By limiting the data of each id
to 15-minutes from the start, I want to have the following new_df
:
new-df
id timestamp
0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
4 64 2016-04-01 21:04:59
5 64 2016-04-01 21:13:05
How do I achieve this?
Upvotes: 1
Views: 40
Reputation: 13582
Assuming that the column timestamp
is of datetime (using pandas.to_datetime
)
df['timestamp'] = pd.to_datetime(df['timestamp'])
Considering that OP wants the timer to start at the beginning of the first timestamp
for a given id
, the following will do the work
new_df = df.groupby('id').apply(lambda x: x[x.timestamp <= x.timestamp.iloc[0] + pd.Timedelta(minutes=15)])
[Out]:
id timestamp
0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
4 15 2016-04-01 00:16:00
5 64 2016-04-01 21:04:59
6 64 2016-04-01 21:05:13
7 64 2016-04-01 21:05:20
8 64 2016-04-01 21:05:25
9 64 2016-04-01 21:04:59
If OP wants the timer to start at 0 hours
, 0 minutes
, and 0 seconds
, then the following will do the work
new_df2 = df.groupby('id').apply(lambda x: x[x.timestamp <= x.timestamp.iloc[0].replace(hour=0, minute=0, second=0) + pd.Timedelta(minutes=15)])
[Out]:
id timestamp
id
15 0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
Notes:
In both cases, the new-df
that OP shares in the question is different.
In the first operation, x.timestamp.iloc[0]
is selecting the first timestamp
, and serves the purpose of indicating that the first timestamp
will be used to define the start date. In the second operation one is doing the same, but instead of keeping that value, one is replacing it with 0 hours
, 0 minutes
, 0 seconds
.
.iloc
basically allows one to get rows/columns at integer locations. As we want the first it is .iloc[0]
.
One is using pandas.Timedelta
to represent the difference between the two dates. In this case it is 15 minutes
.
Upvotes: 0
Reputation: 14238
Your new_df
doesn't match the output that you would get according to your description.
But you can use below and verify your desired output:
df['timestamp'] = pd.to_datetime(df['timestamp'])
d = df.groupby('id').transform(lambda x: (x-x.min()))
d['timestamp'] = d['timestamp'].apply(pd.Timedelta.total_seconds).div(60)
new_df = df[d['timestamp'] <= 15]
print(new_df):
id timestamp
0 15 2016-04-01 00:05:00
1 15 2016-04-01 00:10:20
2 15 2016-04-01 00:13:01
3 15 2016-04-01 00:14:00
4 15 2016-04-01 00:16:00
5 64 2016-04-01 21:04:59
6 64 2016-04-01 21:13:05
Upvotes: 1