Reputation: 121
I am working with python and pandas with the following table/dataframe, containing information about work shifts. The cycle type defines the working/non-working days (e.g., 5-3 means 5 days working, 3 not working). The cycle day indicates in which day of the cycle the worker is (for a 5-3 cycle, this will go from 1 to 5). Finally, the shift indicates whether the worker has been in a morning (M), afternoon (A) or night (N) shift:
Worker | Cycle_type | Cycle_day | Shift |
---|---|---|---|
Alice | 5-3 | 1 | M |
Alice | 5-3 | 2 | M |
Alice | 5-3 | 3 | A |
Alice | 5-3 | 4 | A |
Alice | 5-3 | 5 | N |
Bob | 6-2 | 1 | N |
Bob | 6-2 | 2 | M |
Bob | 6-2 | 3 | M |
Bob | 6-2 | 4 | N |
Bob | 6-2 | 5 | A |
Bob | 6-2 | 6 | M |
The idea is to obtain a new column which indicates the sequence of shifts up the moment of the row. For example, Alice works 2 mornings, then 2 afternoons and then one night, so the sequence the last day would be "M1,M2,A1,A2,N1", where the numbers indicate the numer of iterations of each shift. In the case of Bob, the full sequence for the last day would be "N1,M1,M2,N2,A1,M3". The desired table would be te following:
Worker | Cycle_type | Cycle_day | Shift | Sequence |
---|---|---|---|---|
Alice | 5-3 | 1 | M | M1 |
Alice | 5-3 | 2 | M | M1,M2 |
Alice | 5-3 | 3 | A | M1,M2,A1 |
Alice | 5-3 | 4 | A | M1,M2,A1,A2 |
Alice | 5-3 | 5 | N | M1,M2,A1,A2,N1 |
Bob | 6-2 | 1 | N | N1 |
Bob | 6-2 | 2 | M | N1,M1 |
Bob | 6-2 | 3 | M | N1,M1,M2 |
Bob | 6-2 | 4 | N | N1,M1,M2,N2 |
Bob | 6-2 | 5 | A | N1,M1,M2,N2,A1 |
Bob | 6-2 | 6 | M | N1,M1,M2,N2,A1,M3 |
This sequence must be computed for each day, not knowing about the future, this is, the third day of Alice has a sequence "M1,M2,A1", as it doesn't know about the fourth and fifth day yet. I am unable to think of a way, using python and pandas, to capture all these details. Any ideas?
EDIT: It would also be nice to have the full sequence as a new column, "Full_sequence", where it is shown regardless of the day, if possible
Upvotes: 1
Views: 165
Reputation: 11321
This is an older question, so the answer might be moot.
Unfortunately GroupBy.cumsum
(as well as GroupBy.transform("cumsum")
) doesn't like strings (and lists), so one way to approach this would be to use accumulate
from itertools
:
from itertools import accumulate
def cumjoin(ser): return accumulate(ser, lambda acc, s: f"{acc},{s}")
cycle = df.groupby("Worker")["Cycle_day"].diff().fillna(-1.0).lt(0.0).cumsum()
df["Sequence"] = (
(df["Shift"] + (df.groupby([cycle, "Shift"]).cumcount() + 1).astype("str"))
.groupby(cycle).transform(cumjoin)
)
df["Full_sequence"] = df["Sequence"].groupby(cycle).transform("last")
Result (df
your dataframe):
Worker Cycle_type Cycle_day Shift Sequence Full_sequence
0 Alice 5-3 1 M M1 M1,M2,A1,A2,N1
1 Alice 5-3 2 M M1,M2 M1,M2,A1,A2,N1
2 Alice 5-3 3 A M1,M2,A1 M1,M2,A1,A2,N1
3 Alice 5-3 4 A M1,M2,A1,A2 M1,M2,A1,A2,N1
4 Alice 5-3 5 N M1,M2,A1,A2,N1 M1,M2,A1,A2,N1
5 Bob 6-2 1 N N1 N1,M1,M2,N2,A1,M3
6 Bob 6-2 2 M N1,M1 N1,M1,M2,N2,A1,M3
7 Bob 6-2 3 M N1,M1,M2 N1,M1,M2,N2,A1,M3
8 Bob 6-2 4 N N1,M1,M2,N2 N1,M1,M2,N2,A1,M3
9 Bob 6-2 5 A N1,M1,M2,N2,A1 N1,M1,M2,N2,A1,M3
10 Bob 6-2 6 M N1,M1,M2,N2,A1,M3 N1,M1,M2,N2,A1,M3
There's a way to avoid accumulate
, but it is a bit convoluted:
cycle = df.groupby("Worker")["Cycle_day"].diff().fillna(-1.0).lt(0.0).cumsum()
seq = df["Shift"] + (df.groupby([cycle, "Shift"]).cumcount() + 1).astype("str")
grouper = seq.groupby(cycle)
idx, idx_count = seq.index, grouper.transform("size") - grouper.cumcount()
seq_df = pd.DataFrame({"Sequence": seq, "Cycle": cycle}).loc[idx.repeat(idx_count), :]
grouper = seq_df.groupby(["Cycle", "Sequence"])
groups = grouper.transform("size") - grouper.cumcount()
df["Sequence"] = (
seq_df.groupby(["Cycle", groups], sort=False).agg({"Sequence": ",".join})
.reset_index(drop=True)
)
Upvotes: 2
Reputation: 234
First Add a column into the Data frame and Just Update the data frame column order.
# Declare a list that is to be converted into a column
Sequence = [value of Sequence columns]
# Using 'Sequence' as the column name
# and equating it to the list
df['Sequence'] = Sequence
df = df[['Worker','Cycle_type','Cycle_day','Schedule','Sequence']]
Upvotes: 0