James
James

Reputation: 373

Setting pandas dataframe value based on row and column conditions

I have a fairly specific algorithm I want to follow.

Basically I have a dataframe as follows:

        month   taken   score
1       1       2       23
2       1       1       34
3       1       2       12
4       1       2       59
5       2       1       12
6       2       2       23
7       2       1       43
8       2       2       45
9       3       1       43
10      3       2       43
11      4       1       23
12      4       2       94

I want to make it so that the 'score' column is changed to 100 on days where taken == 2 continuously until the end of that month. So, not all occurrences of taken == 2 have their score set to 100, if any day following during that month has a taken == 1.

So the result I'd want is:

        month   taken   score
1       1       2       23
2       1       1       34
3       1       2       100
4       1       2       100
5       2       1       12
6       2       2       23
7       2       1       43
8       2       2       100
9       3       1       43
10      3       2       43
11      3       1       23
12      3       2       100
13      4       1       32
14      4       2       100

I've written this code which I feel should do it:

#iterate through months
for month in range(12):
    #iterate through scores
    for score in range(len(df_report.loc[df_report['month'] == month+1])):
        #starting from the bottom, of that month, if 'taken' == 2...
        if df_report.loc[df_report.month==month+1, 'taken'].iloc[-score-1] == 2:
            #then set the score to 100
            df_report.loc[df_report.month==month+1, 'score'].iloc[-score-2] = 100
        #if you run into a 'taken' == 1, move on to next month
        else: break

However, this doesn't appear to change any values, despite not throwing an error... it also doesn't give me an error about setting values to a copied dataframe.

Could anyone explain what I'm doing wrong?

Upvotes: 0

Views: 338

Answers (2)

Bharath M Shetty
Bharath M Shetty

Reputation: 30605

You can do

import numpy as np
def get_value(x):
    s = x['taken']
    # Get a mask of duplicate sequeence and change values using np.where
    mask = s.ne(s.shift()).cumsum().duplicated(keep=False)
    news = np.where(mask,100,x['score'])

    # if last number is 2 then change the news value to 100
    if s[s.idxmax()] == 2: news[-1] = 100 
    return pd.Series(news)

df['score'] = df.groupby('month').apply(get_value).values

Output :

   month  taken  score
1       1      2     23
2       1      1     34
3       1      2    100
4       1      2    100
5       2      1     12
6       2      2     23
7       2      1     43
8       2      2    100
9       3      1     43
10      3      2    100
11      4      1     23
12      4      2    100

Almost identical speed but @coldspeed is winner

ndf = pd.concat([df]*10000).reset_index(drop=True)

%%timeit
ndf['score'] = ndf.groupby('month').apply(foo)
10 loops, best of 3: 40.8 ms per loop


%%timeit  
ndf['score'] = ndf.groupby('month').apply(get_value).values
10 loops, best of 3: 42.6 ms per loop

Upvotes: 2

cs95
cs95

Reputation: 402263

The reason for your values not being updated is that assignment to iloc updates the copy returned by the preceding loc call, so the original is not touched.


Here's how I'd tackle this. First, define a function foo.

def foo(df):
    for i in reversed(df.index):
        if df.loc[i, 'taken'] != 2:
            break
        df.loc[i, 'score'] = 100
        i -= 1
    return df

Now, groupby month and call foo:

df = df.groupby('month').apply(foo)
print(df) 
    month  taken  score
1       1      2     23
2       1      1     34
3       1      2    100
4       1      2    100
5       2      1     12
6       2      2     23
7       2      1     43
8       2      2    100
9       3      1     43
10      3      2    100
11      4      1     23
12      4      2    100

Obviously, apply has its shortcomings, but I cannot think of a vectorised approach to this problem.

Upvotes: 2

Related Questions