Reputation: 435
I want to filter out rows contains specific values through out multiple columns in data frame.
E.g
code tag number floor note
1 1111 * ** 34 no
2 2323 7 899 7 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
6 3677 55 908 11 no
I want to filter out all rows contains #, *, ** in columns code, tag, number, floor.
What I want to get is
code tag number floor note
1 1111 * ** 34 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
I was trying to use isin method in data frame, but it does work with one column, but does not work in multiple columns. Thanks!
Upvotes: 2
Views: 1429
Reputation: 294258
Option 1
Assuming no other pre-existing pir
df[df.replace(['#', '*', '**'], 'pir').eq('pir').any(1)]
code tag number floor note
1 1111 * ** 34 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
Option 2
Obnoxious numpy
broadcasting. Fast at first but scales quadratically
df[(df.values[None, :] == np.array(['*', '**', '#'])[:, None, None]).any(0).any(1)]
code tag number floor note
1 1111 * ** 34 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
Option 3
Less obnoxious np.in1d
df[np.in1d(df.values, ['*', '**', '#']).reshape(df.shape).any(1)]
code tag number floor note
1 1111 * ** 34 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
Option 4
Over the top with map
df[list(
map(bool,
map({'*', '**', '#'}.intersection,
map(set,
zip(*(df[c].values.tolist() for c in df)))))
)]
code tag number floor note
1 1111 * ** 34 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
Upvotes: 1
Reputation: 402493
You could also use df.applymap
:
s = {'*', '**', '#'}
df[df.applymap(lambda x: x in s).max(1)]
code tag number floor note
1 1111 * ** 34 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
piR suggested a crazy (but it works!) alternative:
df[df.apply(set, 1) & {'*', '**', '#'}]
code tag number floor note
1 1111 * ** 34 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
Upvotes: 1
Reputation: 153460
I think you need apply
, isin
, and any
with boolean indexing:
list = ['#','*','**']
cols = ['code','tag','number','floor']
df[df[cols].apply(lambda x: x.isin(list).any(), axis=1)]
Output:
code tag number floor note
1 1111 * ** 34 no
3 3677 # 900 11 no
4 9897 10 134 * no
5 # # 566 11 no
Upvotes: 1