Reputation: 3096
Given the following dataframe:
import pandas as pd
pd.DataFrame({"start": ["2017-01-01 13:09:01", "2017-01-01 13:09:07", "2017-01-01 13:09:12"],
"end": ["2017-01-01 13:09:05", "2017-01-01 13:09:09", "2017-01-01 13:09:14"],
"status": ["OK", "ERROR", "OK"]})
HAVE:
| start | end | status |
|---------------------|---------------------|--------|
| 2017-01-01 13:09:01 | 2017-01-01 13:09:05 | OK |
| 2017-01-01 13:09:07 | 2017-01-01 13:09:09 | ERROR |
| 2017-01-01 13:09:12 | 2017-01-01 13:09:14 | OK |
I want to convert it to another format, that is, "unfold" the intervals and make them into a DatetimeIndex, and resample the data. The result should look like this:
WANT:
| | status |
|---------------------|-----------|
| 2017-01-01 13:09:01 | OK |
| 2017-01-01 13:09:02 | OK |
| 2017-01-01 13:09:03 | OK |
| 2017-01-01 13:09:04 | OK |
| 2017-01-01 13:09:05 | OK |
| 2017-01-01 13:09:06 | NAN |
| 2017-01-01 13:09:07 | ERROR |
| 2017-01-01 13:09:08 | ERROR |
| 2017-01-01 13:09:09 | ERROR |
| 2017-01-01 13:09:10 | NAN |
| 2017-01-01 13:09:11 | NAN |
| 2017-01-01 13:09:12 | OK |
| 2017-01-01 13:09:13 | OK |
| 2017-01-01 13:09:14 | OK |
Any help is very much appreciated!
Upvotes: 7
Views: 2558
Reputation: 33783
Using IntervalIndex
:
# create an IntervalIndex from start/end
iv_idx = pd.IntervalIndex.from_arrays(df['start'], df['end'], closed='both')
# generate the desired index of individual times
new_idx = pd.date_range(df['start'].min(), df['end'].max(), freq='s')
# set the index of 'status' as the IntervalIndex, then reindex to the new index
result = df['status'].set_axis(iv_idx, inplace=False).reindex(new_idx)
The resulting output for result
:
2017-01-01 13:09:01 OK
2017-01-01 13:09:02 OK
2017-01-01 13:09:03 OK
2017-01-01 13:09:04 OK
2017-01-01 13:09:05 OK
2017-01-01 13:09:06 NaN
2017-01-01 13:09:07 ERROR
2017-01-01 13:09:08 ERROR
2017-01-01 13:09:09 ERROR
2017-01-01 13:09:10 NaN
2017-01-01 13:09:11 NaN
2017-01-01 13:09:12 OK
2017-01-01 13:09:13 OK
2017-01-01 13:09:14 OK
Freq: S, Name: status, dtype: object
Upvotes: 6
Reputation: 153460
Let's try this, using apply
and rebuild series with date_range
, then resample
to fill missing time populated with NaN from asfreq
:
df.apply(lambda x: pd.Series(index=pd.date_range(x['start'],
x['end'],
freq='S'),
data=x['status']), axis=1)\
.T\
.stack().reset_index(level=1, drop=True)\
.resample('S').asfreq()
Output:
2017-01-01 13:09:01 OK
2017-01-01 13:09:02 OK
2017-01-01 13:09:03 OK
2017-01-01 13:09:04 OK
2017-01-01 13:09:05 OK
2017-01-01 13:09:06 NaN
2017-01-01 13:09:07 ERROR
2017-01-01 13:09:08 ERROR
2017-01-01 13:09:09 ERROR
2017-01-01 13:09:10 NaN
2017-01-01 13:09:11 NaN
2017-01-01 13:09:12 OK
2017-01-01 13:09:13 OK
2017-01-01 13:09:14 OK
Freq: S, dtype: object
Upvotes: 2