New column based on threshold of next rows in Python Pandas

Question

I have a pandas dataframe column that looks as follows:

I'm trying to create a new column that is determined based on the current row value but taking into account the next few row values also. So e.g. if (row_col1(i) > 60) & ((row_col1(i+1)+row_col1(i+2)+row_col1(i+3)) > 100), write 'yes' in col2.

col1   col2
200    yes
300    yes
400    yes
200    yes
500    yes
700    yes
0      no
0      no
60     no
0      no
0      no

Any ideas on how to achieve this?

Ji Wei · Accepted Answer

You can handle this in a DataFrame using the .shift() method. Refer to this link for an example.

Here's one approach which uses zip and list:

l1 = df['col1'].tolist()
l2 = []
for a,b,c,d in zip(l1,l1[1:],l1[2:],l1[3:]):
   if a>60 & (b+c+d)>100:
      l2 += ['yes']
   else:
      l2 += ['no']
l2 += ['','',''] # cater for the last 3 entries. I've left it blank, you can decide whether to go for 'yes' or 'no' or 'NA'
df['col2']= pd.Series(l2)

New column based on threshold of next rows in Python Pandas

Answers (1)

Related Questions