Reputation: 477
I want to filter a dataframe on multiple conditions. Let's say I have one column called 'detail', i want to get a dataframe where the 'detail' column values match the following:
detail = unidecode.unidecode(str(row['detail']).lower())
So now I have all detail
rows unidecoded and to lowercase, then i want to extract the rows that start with some substring like:
detail.startswith('bomb')
And finally also take the rows where another integer column equals 100.
I tried to do this but obviously it doesn't work:
llista_dfs['df_bombes'] = df_filtratge[df_filtratge['detail'].str.lower().startswith('bomb') or df_filtratge['family']==100]
This line above is what I would like to execute but I'm not sure which is the syntax to be able to achieve this in a single line of code (if that's possible).
That's an example of what the code should do:
Initial table:
detail family
0 bòmba 90
1 boMbá 87
2 someword 100
3 someotherword 65
4 Bombá 90
Result table:
detail family
0 bòmba 90
1 boMbá 87
2 someword 100
4 Bombá 90
Upvotes: 0
Views: 815
Reputation: 13437
Actually @user3483203's comment is the right solution as to filter in pandas you use &
and |
instead of and
and or
. In any case in case you want to get rid of unidecode
you might use this solution:
import pandas as pd
txt="""0 bòmba 90
1 boMbá 87
2 someword 100
3 someotherword 65
4 Bombá 90"""
df = [list(filter(lambda x: x!='', t.split(' ')))[1:]
for t in txt.split("\n")]
df = pd.DataFrame(df, columns=["details", 'family'])
df["family"] = df["family"].astype(int)
cond1 = df["details"].str.normalize('NFKD')\
.str.encode('ascii', errors='ignore')\
.str.decode('utf-8')\
.str.lower()\
.str.startswith('bomba')
cond2 = df["family"]==100
df[cond1 | cond2]
Upvotes: 1