Pandas: how do I vectorise a multi-column filter?

Question

I would like to efficiently filter a dataframe according to values in multiple columns.

I have this dataframe:

index	type	group	home
a	offence	1	home
b	offence	2	home
c	offence	2	away
d	offence	3	home
e	defence	3	away

I want to identify the rows preceding a row with type defence that have group equal to or one less than the defence row, and whose home value is not the same as the defence row. e.g. for the above:

index	type	group	home
b	offence	2	home
d	offence	3	home

So I want to return ['b', 'd'].

This is easy if iterating:

return_rows = []
for _, def_row in df[df['type'] == 'defence'].iterrows():
    this_iloc = df.index.get_loc(def_row.name)
    while True:
        this_iloc -= 1
        if this_iloc <= 0:
            break
        this_row = df.iloc[this_iloc]
        if this_row['group'] < def_row['group'] - 1:
            break
        if this_row['home'] != def_row['home']:
            return_rows.append(this_row.name)

I can now set these rows using df.loc[df.index.isin(return_rows)]

I would like to do this vectorized, without iterating.

I would start like this, under the assumption that the rows in question will be at most 5 rows back:

for i in range(1, 6):
    df = pd.concat([df, df.shift(i).add_suffix(str(i))], axis=1)
return_rows = df[
    (df['group1'] >= df['group'] - 1 & df['home1'] != df['home']) | 
    (df['group2'] >= df['group'] - 1 & df['home2'] != df['home']) |
    ... (etc.)
]

However, this would return the defence rows, and I actually want to return the rows preceding defence rows.

How do I do this in a vectorised way?

Pandas: how do I vectorise a multi-column filter?

Answers (1)

Related Questions