Reputation: 1014
I have a dataset with 15-minutes observations for different stations for 20 years. I want to know the range time that each station has data.
station_id | start_time | end_time | observation |
---|---|---|---|
2 | 2000-01-02 01:00:00 | 2000-01-02 01:15:00 | 50 |
2 | 2000-01-02 01:15:00 | 2000-01-02 01:30:00 | 15 |
2 | 2000-02-02 01:30:00 | 2000-01-02 01:45:00 | 3 |
3 | 2000-01-02 05:00:00 | 2000-01-02 05:15:00 | 10 |
3 | 2000-01-02 05:15:00 | 2000-01-02 05:30:00 | 2 |
3 | 2000-02-03 01:00:00 | 2000-01-02 01:15:00 | 15 |
3 | 2000-02-04 01:00:00 | 2000-01-02 01:15:00 | 20 |
an example of I want to have
|station_id | start | end | years |days
| 2 |2000-01-02 01:00:00|2000-01-02 01:45:00| 1 | 1
| 3 |2000-01-02 05:00:00|2000-01-02 01:15:00| 1 | 1
Upvotes: 0
Views: 34
Reputation: 71560
Try using groupby
, diff
, abs
, agg
and assign
:
df[['start_time', 'end_time']] = df[['start_time', 'end_time']].apply(pd.to_datetime)
x = df.groupby('station_id').agg({'start_time': 'first', 'end_time': 'last'})
temp = x.diff(axis=1).abs()['end_time']
x = x.assign(years=temp.dt.days // 365, days=temp.dt.days % 365).reset_index()
print(x)
Upvotes: 1