ah bon
ah bon

Reputation: 10011

Filter based on one column if its row's contents contains the elements of a list in Pandas

Given a dataset as follows:

            city  value
0   beijing city     23
1  shanghai city     34
2      guangzhou     45
3  shenzhen city     56
4     wuhan city     67
5      xian city     78

I would like to filter rows based on a list cities = ['beijing', 'guangzhou', 'shenzhen']

If the elements are contained in the city column, then filter out these rows.

How could I do that in Pandas? Thanks.

The expected result:

            city  value
0   beijing city     23
1      guangzhou     45
2  shenzhen city     56

Upvotes: 0

Views: 37

Answers (2)

wwnde
wwnde

Reputation: 26676

split city by space and pick desired str by index. Check occurance by str. contains

df[df['city'].str.split('\s').str[0].str.contains('|'.join(cities))]

    

            city   value
0   beijing city     23
2      guangzhou     45
3  shenzhen city     56

Following your comments below, use;

df[df['city'].str.split('\s').str[0].isin(cities)]

Upvotes: 1

Anurag Dabas
Anurag Dabas

Reputation: 24314

Try via str.contains():

m = df['city'].str.contains('|'.join(cities))

Finally:

out = df[m]
#OR
out = df.loc[m]

Note: If you have mixed format of word in the dataframe(uppercase,lowercase or title) then you can use IGNORECASE flag from re module so the 1st method become:

from re import IGNORECASE

m = df['city'].str.contains('|'.join(cities), flags = IGNORECASE)

Finally:

out = df[m]
#OR
out = df.loc[m]

Upvotes: 1

Related Questions