Reputation: 684
I would like to efficiently filter a dataframe according to values in multiple columns.
I have this dataframe:
index | type | group | home |
---|---|---|---|
a | offence | 1 | home |
b | offence | 2 | home |
c | offence | 2 | away |
d | offence | 3 | home |
e | defence | 3 | away |
I want to identify the rows preceding a row with type defence
that have group
equal to or one less than the defence
row, and whose home
value is not the same as the defence
row. e.g. for the above:
index | type | group | home |
---|---|---|---|
b | offence | 2 | home |
d | offence | 3 | home |
So I want to return ['b', 'd']
.
This is easy if iterating:
return_rows = []
for _, def_row in df[df['type'] == 'defence'].iterrows():
this_iloc = df.index.get_loc(def_row.name)
while True:
this_iloc -= 1
if this_iloc <= 0:
break
this_row = df.iloc[this_iloc]
if this_row['group'] < def_row['group'] - 1:
break
if this_row['home'] != def_row['home']:
return_rows.append(this_row.name)
I can now set these rows using df.loc[df.index.isin(return_rows)]
I would like to do this vectorized, without iterating.
I would start like this, under the assumption that the rows in question will be at most 5 rows back:
for i in range(1, 6):
df = pd.concat([df, df.shift(i).add_suffix(str(i))], axis=1)
return_rows = df[
(df['group1'] >= df['group'] - 1 & df['home1'] != df['home']) |
(df['group2'] >= df['group'] - 1 & df['home2'] != df['home']) |
... (etc.)
]
However, this would return the defence
rows, and I actually want to return the rows preceding defence
rows.
How do I do this in a vectorised way?
Upvotes: 0
Views: 228
Reputation: 323316
Try with shift
out = df[df.home.eq('away').shift(-1).dropna() & df.home.ne('away') ]
Out[376]:
index type group home
1 b offence 2 home
3 d offence 3 home
Update
s = df.home.eq('away').groupby(df.group).idxmax()-1
df.loc[s[s>=0]]
Out[384]:
index type group home
1 b offence 2 home
3 d offence 3 home
Upvotes: 3