Koen Wijnen
Koen Wijnen

Reputation: 45

Iterate over rows in Python, count and stop when certain value is reached

I have a dataset which contains a lot of items for which I track the status each week (so an item can occur multiple times in the dataset). I would like to build logic which counts the number of consecutive weeks an item has had a given status. Per item I would like to see how long it was status "z" and preferably in which week the item was status "z" for the last time. I only want the counter to start from the first week the item became status "z". Once it runs into a week where this item was no longer status "z" I want the counter to stop and insert the value it has at the original row. For all weeks I only want to take historical weeks into account. (Week 2 should not take week 3 into account).

Furthermore, I would like to include the most recent week it had status z. Also, for items which don't have status z in the current week I would like to see when the last week was when status z was applicable.

df = pd.DataFrame({'WeekNr': [202301,202302,202303,202304,202305,202301,202302,202303,202304,202305], 'Status': ['A', 'A', 'A', 'Z', 'Z', 'Z', 'A', 'A', 'Z', 'Z'], 'Item': ['x', 'x', 'x', 'x', 'x', 'y', 'y', 'y', 'y','y']})

First, I sort my dataframe to make sure we iterate in a chronological order:
df.sort_values('WeekNr', ascending = False)

check = 0
for index, row in df.iterrows():
    for index2,row2 in df.iterrows():
        if row["Item"] == row2["Item"]:
            if row2["Status"] == "z":
                check += 1
        elif row["Item"] == row2["Item"]:
            if row2["Status"] != "z":
                row["Check"] = check
        else:
            continue
    Check = 0    


Preferred output would be:
202301    A    x    0    -
202302    A    x    0    -
202303    A    x    0    -
202304    Z    x    1    202304
202305    Z    x    2    202304
202301    Z    y    1    202301
202302    A    y    0    202301
202303    A    y    0    202301
202304    Z    y    1    202304
202305    Z    y    2    202304

Could someone point out what I am doing wrong/suggest some improvements?

Thanks!

Upvotes: 0

Views: 204

Answers (1)

mozway
mozway

Reputation: 261810

I would use:

# sort by Item/WeekNr
df = df.sort_values(by=['Item', 'WeekNr'])

# identify Z
m = df['Status'].eq('Z')

# group by successsive Status
group = group = df['Status'].ne(df.groupby('Item')['Status'].shift()).cumsum()

# set up grouper
g = df.groupby(group)

# add counter
df['counter'] = g.cumcount().add(1).where(m, 0)

# Get first WeekNr, or previous one for Z
df['Check'] = (g['WeekNr'].transform('first')
                 .where(m).groupby(df['Item']).ffill()
              )

Output:

   WeekNr Status Item  counter     Check
0  202301      A    x        0       NaN
1  202302      A    x        0       NaN
2  202303      A    x        0       NaN
3  202304      Z    x        1  202304.0
4  202305      Z    x        2  202304.0
5  202301      Z    y        1  202301.0
6  202302      A    y        0  202301.0
7  202303      A    y        0  202301.0
8  202304      Z    y        1  202304.0
9  202305      Z    y        2  202304.0

Upvotes: 2

Related Questions