Filtering rows on multiple string conditions at the same column

I want to filter a dataframe on multiple conditions. Let's say I have one column called 'detail', i want to get a dataframe where the 'detail' column values match the following:

detail = unidecode.unidecode(str(row['detail']).lower())

So now I have all detail rows unidecoded and to lowercase, then i want to extract the rows that start with some substring like:

detail.startswith('bomb')

And finally also take the rows where another integer column equals 100.

I tried to do this but obviously it doesn't work:

llista_dfs['df_bombes'] = df_filtratge[df_filtratge['detail'].str.lower().startswith('bomb') or df_filtratge['family']==100]

This line above is what I would like to execute but I'm not sure which is the syntax to be able to achieve this in a single line of code (if that's possible).

That's an example of what the code should do:

Initial table:

    detail            family
0   bòmba             90
1   boMbá             87
2   someword          100
3   someotherword     65
4   Bombá             90

Result table:

    detail             family
0   bòmba              90
1   boMbá              87
2   someword           100
4   Bombá              90

Upvotes: 0

Views: 815

Answers (1)

rpanai
rpanai

Reputation: 13437

Actually @user3483203's comment is the right solution as to filter in pandas you use & and | instead of and and or. In any case in case you want to get rid of unidecode you might use this solution:

import pandas as pd

txt="""0   bòmba             90
1   boMbá             87
2   someword          100
3   someotherword     65
4   Bombá             90"""

df = [list(filter(lambda x: x!='', t.split(' ')))[1:] 
      for t in txt.split("\n")]

df = pd.DataFrame(df, columns=["details", 'family'])
df["family"] = df["family"].astype(int)

cond1 = df["details"].str.normalize('NFKD')\
                     .str.encode('ascii', errors='ignore')\
                     .str.decode('utf-8')\
                     .str.lower()\
                     .str.startswith('bomba')

cond2 = df["family"]==100

df[cond1 | cond2]

Upvotes: 1

Related Questions