Smart indexing by time in pandas

Question

I have dataframe like that:

| Date    | Device | Status |
| 1990/01 | 50     | ON     |
| 1990/01 | 20     | ON     |
| 1990/03 | 25     | ON     |
| 1990/05 | 50     | OFF    |
| 2000/01 | 20     | OFF    |

Basically, I log change of devices status when it occurs, but this is not periodic so 'Date' column is not 'continous'. I want to query device status at any date with some simple logic E.g.

df(device=50, date=1990/01) -> ON - easy
df(device=50, date=1990/02) -> ON , there is no 1990/02 for device 50, but when device is set to specific Status, then it stays in it unless there is no change
df(device=50, date=1990/05) -> OFF
df(device=50, date=2000/09) -> OFF
df(device=50, date=1900/01) -> OFF, this is a tricky one, before first ON, device is in OFF status

The second case is with different statuses, but to make question concise I'll keep it the same. Behavior is exactly the same except point 5. when we assume that device before first logged ON status is also in ON status. There could be more statuses but logic to decide device status is exactly the same.

How to do it in pandastic way?

jezrael · Accepted Answer

In my opinion there has to be define all possible dates for select in date_range which is used for reindex of pivoted DataFrame. NaNs are replaced first by forward filling and all first NaN values to OFF by fillna:

print (df)
      Date  Device Status
0  1990/01      50     ON
1  1990/01      20     ON
2  1990/03      25     ON
3  1990/05      50    OFF
4  1990/05      20    OFF <-changed for smaller output df

df['Date'] = pd.to_datetime(df['Date'])

rng = pd.date_range('1989-10-01', '1991-01-01', freq='MS')
df = df.pivot('Date','Device','Status').reindex(rng).ffill().fillna('OFF')
print (df)
Device       20   25   50
1989-10-01  OFF  OFF  OFF
1989-11-01  OFF  OFF  OFF
1989-12-01  OFF  OFF  OFF
1990-01-01   ON  OFF   ON
1990-02-01   ON  OFF   ON
1990-03-01   ON   ON   ON
1990-04-01   ON   ON   ON
1990-05-01  OFF   ON  OFF
1990-06-01  OFF   ON  OFF
1990-07-01  OFF   ON  OFF
1990-08-01  OFF   ON  OFF
1990-09-01  OFF   ON  OFF
1990-10-01  OFF   ON  OFF
1990-11-01  OFF   ON  OFF
1990-12-01  OFF   ON  OFF
1991-01-01  OFF   ON  OFF

Last if need original format of dates add strftime:

df.index = df.index.strftime('%Y/%m')
print (df)
Device    20   25   50
1989/10  OFF  OFF  OFF
1989/11  OFF  OFF  OFF
1989/12  OFF  OFF  OFF
1990/01   ON  OFF   ON
1990/02   ON  OFF   ON
1990/03   ON   ON   ON
1990/04   ON   ON   ON
1990/05  OFF   ON  OFF
1990/06  OFF   ON  OFF
1990/07  OFF   ON  OFF
1990/08  OFF   ON  OFF
1990/09  OFF   ON  OFF
1990/10  OFF   ON  OFF
1990/11  OFF   ON  OFF
1990/12  OFF   ON  OFF
1991/01  OFF   ON  OFF

EDIT:

More general solution:

def get_status(df, device, check_date):
    check_date = pd.to_datetime(check_date)
    df['Date'] = pd.to_datetime(df['Date'])

    rng = pd.date_range(df['Date'].min(), df['Date'].max(), freq='MS')
    df = df.pivot('Date','Device','Status').reindex(rng).ffill().fillna('OFF')
    #print (df)

    if check_date < df.index.min():
        return 'OFF'
    elif check_date > df.index.max():
        return df.loc[df.index[-1], device]
    else:
        return df.loc[check_date, device]

print (get_status(df, 50, '1990/01'))
#ON
print (get_status(df, 50, '1990/02'))
#ON
print (get_status(df, 50, '1990/05'))
#OFF
print (get_status(df, 50, '1990/09'))
#OFF
print (get_status(df, 50, '1900/01'))
#OFF

Smart indexing by time in pandas

Answers (2)

Related Questions