Condition based on nth row above or below current row - Python

Question

I want to replicate something similar to the "Multi Row Formula" tool in Alteryx. I am currently reading a csv file and would like to be able to set a value in a list if the value of another row in the same tested column is true.

Sample Data

**Country**
China
India
Brazil
Indonesia

When it loops through the row containing Brazil, "Y" should be appended into the new list since China is two rows above. "N" should be appended for the rest.

import pandas as pd

csv_in = pd.read_csv('C:/sample.csv')

kind = []                           

for row in csv_in['Country']:
    if ***two rows above this row*** == 'China':
        kind.append('Y')
    elif ***one row below this row*** == 'Canada':
        kind.append('Y')
    else:
        kind.append("N")

csv_in['Result'] = kind

I am having trouble finding anything specific to this issue. Any help would be appreciated!

EDIT: I realized I need to do a little more in addition to what I asked initially.

  for row in csv_in['Country']:
      if  'hina' in ***two rows above this row***:
          kind.append('Y')
      elif ***one row below this row***.startswith('Can'):
          kind.append('X')
      else:
          kind.append("N")

root · Accepted Answer

Use shift to build a Boolean array of where the 'Y' values should appear, then use numpy.where to create the column:

import numpy as np

y_cond = (csv_in.shift(2) == 'China') | (csv_in.shift(-1) == 'Canada')
csv_in['Result'] = np.where(y_cond, 'Y', 'N')

If you have more than one column in your DataFrame, you'll need to use csv_in['Country'].shift() instead of the shorter notation in the code above.

The resulting output on some slightly expanded sample data:

     Country Result
0      China      N
1      India      N
2     Brazil      Y
3  Indonesia      N
4     Bhutan      N
5     Mexico      Y
6     Canada      N
7       Peru      N
8   Honduras      N

EDIT:

If you want to assign non-binary values, I'd take a slightly different approach.

Begin by initializing the results as 'N'. For each condition, create a Boolean array similarly as before, and use loc to assign the desired value. Do this in reverse order of importance, as subsequent matches will overwrite previous ones.

Note that you can use the .str accessor to apply string functions to a column, as described in the Working with Text Data section of the documentation.

csv_in['Result'] = 'N'

x_cond = csv_in['Country'].shift(-1).str.startswith('Can').fillna(False)
csv_in.loc[x_cond, 'Result'] = 'X'

y_cond = csv_in['Country'].shift(2).str.contains('hina').fillna(False)
csv_in.loc[y_cond, 'Result'] = 'Y'

The .fillna(False) is necessary as loc needs purely Boolean values, and shift introduces NaN values. If you really want to write your conditions in order of importance, you could do something like x_cond & (csv_in['Result'] == 'N') inside of loc, although it might hinder performance.

Updated output:

     Country Result
0      China      N
1      India      N
2     Brazil      Y
3  Indonesia      N
4     Bhutan      N
5     Mexico      X
6     Canada      N
7       Peru      N
8   Honduras      N

Condition based on nth row above or below current row - Python

Answers (1)

Related Questions