MMM
MMM

Reputation: 435

Filter rows containing certain values in the columns

I want to filter out rows contains specific values through out multiple columns in data frame.

E.g

    code tag number floor  note
1   1111  *   **     34     no 
2   2323  7   899     7     no
3   3677  #   900    11     no
4   9897  10  134    *      no
5    #    #   566    11     no
6   3677  55  908    11     no

I want to filter out all rows contains #, *, ** in columns code, tag, number, floor.

What I want to get is

    code tag number floor  note
1   1111  *   **     34     no 
3   3677  #   900    11     no
4   9897  10  134    *      no
5    #    #   566    11     no

I was trying to use isin method in data frame, but it does work with one column, but does not work in multiple columns. Thanks!

Upvotes: 2

Views: 1429

Answers (3)

piRSquared
piRSquared

Reputation: 294258

Option 1
Assuming no other pre-existing pir

df[df.replace(['#', '*', '**'], 'pir').eq('pir').any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Option 2
Obnoxious numpy broadcasting. Fast at first but scales quadratically

df[(df.values[None, :] == np.array(['*', '**', '#'])[:, None, None]).any(0).any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Option 3
Less obnoxious np.in1d

df[np.in1d(df.values, ['*', '**', '#']).reshape(df.shape).any(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Option 4
Over the top with map

df[list(
    map(bool,
        map({'*', '**', '#'}.intersection,
            map(set,
                zip(*(df[c].values.tolist() for c in df)))))
)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Upvotes: 1

cs95
cs95

Reputation: 402493

You could also use df.applymap:

s = {'*', '**', '#'}
df[df.applymap(lambda x: x in s).max(1)]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

piR suggested a crazy (but it works!) alternative:

df[df.apply(set, 1) & {'*', '**', '#'}]

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Upvotes: 1

Scott Boston
Scott Boston

Reputation: 153460

I think you need apply, isin, and any with boolean indexing:

list = ['#','*','**']
cols = ['code','tag','number','floor']
df[df[cols].apply(lambda x: x.isin(list).any(), axis=1)]

Output:

   code tag number floor note
1  1111   *     **    34   no
3  3677   #    900    11   no
4  9897  10    134     *   no
5     #   #    566    11   no

Upvotes: 1

Related Questions