Reputation: 10011
Given a dataset as follows:
city value
0 beijing city 23
1 shanghai city 34
2 guangzhou 45
3 shenzhen city 56
4 wuhan city 67
5 xian city 78
I would like to filter rows based on a list cities = ['beijing', 'guangzhou', 'shenzhen']
If the elements are contained in the city
column, then filter out these rows.
How could I do that in Pandas? Thanks.
The expected result:
city value
0 beijing city 23
1 guangzhou 45
2 shenzhen city 56
Upvotes: 0
Views: 37
Reputation: 26676
split city by space and pick desired str by index. Check occurance by str. contains
df[df['city'].str.split('\s').str[0].str.contains('|'.join(cities))]
city value
0 beijing city 23
2 guangzhou 45
3 shenzhen city 56
Following your comments below, use;
df[df['city'].str.split('\s').str[0].isin(cities)]
Upvotes: 1
Reputation: 24314
Try via str.contains()
:
m = df['city'].str.contains('|'.join(cities))
Finally:
out = df[m]
#OR
out = df.loc[m]
Note: If you have mixed format of word in the dataframe(uppercase,lowercase or title) then you can use IGNORECASE
flag from re
module so the 1st method become:
from re import IGNORECASE
m = df['city'].str.contains('|'.join(cities), flags = IGNORECASE)
Finally:
out = df[m]
#OR
out = df.loc[m]
Upvotes: 1