JachymDvorak
JachymDvorak

Reputation: 149

Python iterate over df and refer to next row

I have a dataframe with two columns - Col1 and Col2. I want to iterate through it and return only the values that follow the pattern:

if A followed by B, return AB, else return nothing. I want to basically get rid of all A's that are not immediatelly followed by B's. Note that B's are in my real example always preceded by A.

df = pd.DataFrame({"Col1": ['A', 'B', 'A', 'A', 'B'],
                   "Col2": [11, 22, 33, 44, 55]})

  Col1  Col2
0    A    11
1    B    22
2    A    33
3    A    44
4    B    55

I came up with this code:

for index, row in df.iterrows():
    if row['Col1'] == 'A':
        if row.shift(1)['Col1'] == 'B':
            print(row)
            print(row.shift(1))
        else:
            continue
    else:
        continue

I did just the "print function" for now to see if the correct entry prints but I'll be glad if anyone helps me with a way to create a new DF with these rows.

The code above returns nothing, but also no error.

Edit: example desired output

  Col1  Col2
0    A    11
1    B    22
3    A    44
4    B    55

Upvotes: 1

Views: 976

Answers (3)

JachymDvorak
JachymDvorak

Reputation: 149

My friend helped me and found a solution, so I am sharing for the ones that stumble accross this.

df["Col3"] = df["Col1"].shift(-1)
df["Col4"] = df["Col1"].shift(1)

df_new = df[((df["Col1"] == "A") & (df["Col3"] == "B")) | ((df["Col1"] == "B") & (df["Col4"] == "A"))]
df_new = df_new[["Col1", "Col2"]]

df_new

Upvotes: 0

Shubham Sharma
Shubham Sharma

Reputation: 71687

Let's create a boolean mask with eq + shift representing the condition where A is followed by B, then use this mask to filter the dataframe:

m1 = df['Col1'].eq('B') & df['Col1'].shift().eq('A')
m2 = df['Col1'].eq('A') & df['Col1'].shift(-1).eq('B')
df = df[m1 | m2]

  Col1  Col2
0    A    11
1    B    22
3    A    44
4    B    55

Upvotes: 1

Ventoii
Ventoii

Reputation: 46

Can you add an example of the output you desire?

Also, it's generally considered bad practise to create 'for' loops in a pandas dataframe (Takes a lot of time too!). I believe the answer to your question lies in this article:

How to iterate over rows in a DataFrame in Pandas

Upvotes: 0

Related Questions