Setting pandas dataframe value based on row and column conditions

Question

I have a fairly specific algorithm I want to follow.

Basically I have a dataframe as follows:

        month   taken   score
1       1       2       23
2       1       1       34
3       1       2       12
4       1       2       59
5       2       1       12
6       2       2       23
7       2       1       43
8       2       2       45
9       3       1       43
10      3       2       43
11      4       1       23
12      4       2       94

I want to make it so that the 'score' column is changed to 100 on days where taken == 2 continuously until the end of that month. So, not all occurrences of taken == 2 have their score set to 100, if any day following during that month has a taken == 1.

So the result I'd want is:

        month   taken   score
1       1       2       23
2       1       1       34
3       1       2       100
4       1       2       100
5       2       1       12
6       2       2       23
7       2       1       43
8       2       2       100
9       3       1       43
10      3       2       43
11      3       1       23
12      3       2       100
13      4       1       32
14      4       2       100

I've written this code which I feel should do it:

#iterate through months
for month in range(12):
    #iterate through scores
    for score in range(len(df_report.loc[df_report['month'] == month+1])):
        #starting from the bottom, of that month, if 'taken' == 2...
        if df_report.loc[df_report.month==month+1, 'taken'].iloc[-score-1] == 2:
            #then set the score to 100
            df_report.loc[df_report.month==month+1, 'score'].iloc[-score-2] = 100
        #if you run into a 'taken' == 1, move on to next month
        else: break

However, this doesn't appear to change any values, despite not throwing an error... it also doesn't give me an error about setting values to a copied dataframe.

Could anyone explain what I'm doing wrong?

cs95 · Accepted Answer

The reason for your values not being updated is that assignment to iloc updates the copy returned by the preceding loc call, so the original is not touched.

Here's how I'd tackle this. First, define a function foo.

def foo(df):
    for i in reversed(df.index):
        if df.loc[i, 'taken'] != 2:
            break
        df.loc[i, 'score'] = 100
        i -= 1
    return df

Now, groupby month and call foo:

df = df.groupby('month').apply(foo)
print(df) 
    month  taken  score
1       1      2     23
2       1      1     34
3       1      2    100
4       1      2    100
5       2      1     12
6       2      2     23
7       2      1     43
8       2      2    100
9       3      1     43
10      3      2    100
11      4      1     23
12      4      2    100

Obviously, apply has its shortcomings, but I cannot think of a vectorised approach to this problem.

Setting pandas dataframe value based on row and column conditions

Answers (2)

Related Questions