Reputation: 45
I have a dataset which contains a lot of items for which I track the status each week (so an item can occur multiple times in the dataset). I would like to build logic which counts the number of consecutive weeks an item has had a given status. Per item I would like to see how long it was status "z" and preferably in which week the item was status "z" for the last time. I only want the counter to start from the first week the item became status "z". Once it runs into a week where this item was no longer status "z" I want the counter to stop and insert the value it has at the original row. For all weeks I only want to take historical weeks into account. (Week 2 should not take week 3 into account).
Furthermore, I would like to include the most recent week it had status z. Also, for items which don't have status z in the current week I would like to see when the last week was when status z was applicable.
df = pd.DataFrame({'WeekNr': [202301,202302,202303,202304,202305,202301,202302,202303,202304,202305], 'Status': ['A', 'A', 'A', 'Z', 'Z', 'Z', 'A', 'A', 'Z', 'Z'], 'Item': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y','y']})
First, I sort my dataframe to make sure we iterate in a chronological order:
df.sort_values('WeekNr', ascending = False)
check = 0
for index, row in df.iterrows():
for index2,row2 in df.iterrows():
if row["Item"] == row2["Item"]:
if row2["Status"] == "z":
check += 1
elif row["Item"] == row2["Item"]:
if row2["Status"] != "z":
row["Check"] = check
else:
continue
Check = 0
Preferred output would be:
202301 A x 0 -
202302 A x 0 -
202303 A x 0 -
202304 Z x 1 202304
202305 Z x 2 202304
202301 Z y 1 202301
202302 A y 0 202301
202303 A y 0 202301
202304 Z y 1 202304
202305 Z y 2 202304
Could someone point out what I am doing wrong/suggest some improvements?
Thanks!
Upvotes: 0
Views: 204
Reputation: 261810
I would use:
# sort by Item/WeekNr
df = df.sort_values(by=['Item', 'WeekNr'])
# identify Z
m = df['Status'].eq('Z')
# group by successsive Status
group = group = df['Status'].ne(df.groupby('Item')['Status'].shift()).cumsum()
# set up grouper
g = df.groupby(group)
# add counter
df['counter'] = g.cumcount().add(1).where(m, 0)
# Get first WeekNr, or previous one for Z
df['Check'] = (g['WeekNr'].transform('first')
.where(m).groupby(df['Item']).ffill()
)
Output:
WeekNr Status Item counter Check
0 202301 A x 0 NaN
1 202302 A x 0 NaN
2 202303 A x 0 NaN
3 202304 Z x 1 202304.0
4 202305 Z x 2 202304.0
5 202301 Z y 1 202301.0
6 202302 A y 0 202301.0
7 202303 A y 0 202301.0
8 202304 Z y 1 202304.0
9 202305 Z y 2 202304.0
Upvotes: 2