Reputation: 149
I have a dataframe with two columns - Col1 and Col2. I want to iterate through it and return only the values that follow the pattern:
if A followed by B, return AB, else return nothing. I want to basically get rid of all A's that are not immediatelly followed by B's. Note that B's are in my real example always preceded by A.
df = pd.DataFrame({"Col1": ['A', 'B', 'A', 'A', 'B'],
"Col2": [11, 22, 33, 44, 55]})
Col1 Col2
0 A 11
1 B 22
2 A 33
3 A 44
4 B 55
I came up with this code:
for index, row in df.iterrows():
if row['Col1'] == 'A':
if row.shift(1)['Col1'] == 'B':
print(row)
print(row.shift(1))
else:
continue
else:
continue
I did just the "print function" for now to see if the correct entry prints but I'll be glad if anyone helps me with a way to create a new DF with these rows.
The code above returns nothing, but also no error.
Edit: example desired output
Col1 Col2
0 A 11
1 B 22
3 A 44
4 B 55
Upvotes: 1
Views: 976
Reputation: 149
My friend helped me and found a solution, so I am sharing for the ones that stumble accross this.
df["Col3"] = df["Col1"].shift(-1)
df["Col4"] = df["Col1"].shift(1)
df_new = df[((df["Col1"] == "A") & (df["Col3"] == "B")) | ((df["Col1"] == "B") & (df["Col4"] == "A"))]
df_new = df_new[["Col1", "Col2"]]
df_new
Upvotes: 0
Reputation: 71687
Let's create a boolean mask with eq
+ shift
representing the condition where A
is followed by B
, then use this mask to filter the dataframe:
m1 = df['Col1'].eq('B') & df['Col1'].shift().eq('A')
m2 = df['Col1'].eq('A') & df['Col1'].shift(-1).eq('B')
df = df[m1 | m2]
Col1 Col2
0 A 11
1 B 22
3 A 44
4 B 55
Upvotes: 1
Reputation: 46
Can you add an example of the output you desire?
Also, it's generally considered bad practise to create 'for' loops in a pandas dataframe (Takes a lot of time too!). I believe the answer to your question lies in this article:
How to iterate over rows in a DataFrame in Pandas
Upvotes: 0