What is an efficient pandas method to reindex this shift schedule?

Question

I have a pandas dataframe that represents a shift schedule for an entire year, given as:


January 2019    1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  29  30  31
Shift A         1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1
Shift B         0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2
Shift C         0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0
Shift D         2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0
February 2019   1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16  17  18  19  20  21  22  23  24  25  26  27  28  nan nan nan
Shift A         0   0   0   2   2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1   nan nan nan
Shift B         0   0   0   0   0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2   nan nan nan
Shift C         2   2   2   0   1   1   1   0   0   0   2   2   0   0   1   1   1   1   0   2   2   0   0   0   0   0   0   0   nan nan nan
Shift D         1   1   1   1   0   2   2   0   0   0   0   0   0   0   2   2   2   0   1   1   1   0   0   0   2   2   0   0   nan nan nan

Where 1 represents Day shift (06:00 - 18:00), 2 represents Night shift (18:00 - 06:00) and 0 can be ignored. Only a single shift team will be working for a given period.

I need the data in a format where the data is indexed by the DateTime stamp with the current working shift, e.g. :

             DateTime Shift
0 2019-01-01 06:00:00 A
1 2019-01-01 18:00:00 D
2 2019-01-02 06:00:00 A
3 2019-01-02 18:00:00 B
4 2019-01-03 06:00:00 A
5 2019-01-03 18:00:00 B
.
.
.

What would be the most efficient Pandas method to re-index the data to achieve this, i.e. avoiding for loops?

jezrael · Accepted Answer

Use:

#get first column by position
first = df.iloc[:, 0]
#convert column to datetimes with missing values for no datetimes values
dates = pd.to_datetime(first, errors='coerce')
#mask for data row
mask = dates.isna()
#forward filling missing values and replace first NaNs by first column name
df.index = dates.ffill().fillna(pd.to_datetime(first.name))
#filter out rows with datetimes in first column, add first column to index
df = df[mask.values].set_index(first.name, append=True)
#convert columns names to timedeltas in days, first is 0 days
df.columns = pd.to_timedelta(df.columns.astype(int) - 1, unit='D')
#dictionary for map 1, 2 values
mapp = {1: pd.Timedelta('06:00:00'), 2:pd.Timedelta('18:00:00')}
#remove 0 rows with convert to NaN by mask and reshape by stack
#map by dict and convert MultiIndex to columns
df = (df.mask(df == 0)
       .stack()
       .map(mapp)
       .rename_axis(('Datetime','Shift', 'day'))
       .reset_index(name='td')
       )
#add days to hours and add to Datetime
df['Datetime'] += (df.pop('td') + df.pop('day'))
#sorting ans create default index
df = df.sort_values(['Datetime','Shift']).reset_index(drop=True)

print (df)
               Datetime    Shift
0   2019-01-01 06:00:00  Shift A
1   2019-01-01 18:00:00  Shift D
2   2019-01-02 06:00:00  Shift A
3   2019-01-02 18:00:00  Shift B
4   2019-01-03 06:00:00  Shift A
..                  ...      ...
113 2019-02-26 18:00:00  Shift D
114 2019-02-27 06:00:00  Shift A
115 2019-02-27 18:00:00  Shift B
116 2019-02-28 06:00:00  Shift A
117 2019-02-28 18:00:00  Shift B

[118 rows x 2 columns]

What is an efficient pandas method to reindex this shift schedule?

Answers (2)

Related Questions