GeoBeez
GeoBeez

Reputation: 1014

find the date range of groupped data in a dataframe

I have a dataset with 15-minutes observations for different stations for 20 years. I want to know the range time that each station has data.

station_id start_time end_time observation
2 2000-01-02 01:00:00 2000-01-02 01:15:00 50
2 2000-01-02 01:15:00 2000-01-02 01:30:00 15
2 2000-02-02 01:30:00 2000-01-02 01:45:00 3
3 2000-01-02 05:00:00 2000-01-02 05:15:00 10
3 2000-01-02 05:15:00 2000-01-02 05:30:00 2
3 2000-02-03 01:00:00 2000-01-02 01:15:00 15
3 2000-02-04 01:00:00 2000-01-02 01:15:00 20

an example of I want to have

|station_id |      start        |      end          |     years    |days
|    2      |2000-01-02 01:00:00|2000-01-02 01:45:00|       1     | 1
|    3      |2000-01-02 05:00:00|2000-01-02 01:15:00|       1     | 1

Upvotes: 0

Views: 34

Answers (1)

U13-Forward
U13-Forward

Reputation: 71560

Try using groupby, diff, abs, agg and assign:

df[['start_time', 'end_time']] = df[['start_time', 'end_time']].apply(pd.to_datetime)
x = df.groupby('station_id').agg({'start_time': 'first', 'end_time': 'last'})
temp = x.diff(axis=1).abs()['end_time']
x = x.assign(years=temp.dt.days // 365, days=temp.dt.days % 365).reset_index()
print(x)

Upvotes: 1

Related Questions