N.Foe
N.Foe

Reputation: 79

New column based on threshold of next rows in Python Pandas

I have a pandas dataframe column that looks as follows:

col1
200    
300    
400    
200    
500    
700    
0      
0      
60     
0      
0      

I'm trying to create a new column that is determined based on the current row value but taking into account the next few row values also. So e.g. if (row_col1(i) > 60) & ((row_col1(i+1)+row_col1(i+2)+row_col1(i+3)) > 100), write 'yes' in col2.

col1   col2
200    yes
300    yes
400    yes
200    yes
500    yes
700    yes
0      no
0      no
60     no
0      no
0      no

Any ideas on how to achieve this?

Upvotes: 0

Views: 417

Answers (1)

Ji Wei
Ji Wei

Reputation: 881

You can handle this in a DataFrame using the .shift() method. Refer to this link for an example.

Here's one approach which uses zip and list:

l1 = df['col1'].tolist()
l2 = []
for a,b,c,d in zip(l1,l1[1:],l1[2:],l1[3:]):
   if a>60 & (b+c+d)>100:
      l2 += ['yes']
   else:
      l2 += ['no']
l2 += ['','',''] # cater for the last 3 entries. I've left it blank, you can decide whether to go for 'yes' or 'no' or 'NA'
df['col2']= pd.Series(l2)

Upvotes: 1

Related Questions