john taylor
john taylor

Reputation: 1100

Search multiple conditions using str.contains in Dataframe using And Or to fill sparse results

Is there a way to get '&' or 'or' results for sorting multiple columns with string contains?

For example below if the genre column contains rock and demographics contain 30-50 and city contains Gainesville, I get the results that match those strings in the data frame.

Which I understand is the correct results genre column contains rock and or demographics contain 30-50 and/or city contains Gainesville I get the results that match those strings in the data frame?

This way to also add to the sparse results for example will populate 5 results in the .head(5) with the full matches plus the partial matches for the secondary requirements.

The genre has to contain the str rock and the demo may contain the str 35-50 and or the city contains the str Gainesville or not.

df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50')) & (df.city.str.contains('Gainesville'))].head(4)

    name    genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville

The desired result would be something like this

    name    genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville
1   Mike Stevens    pop, rock   18-25,25-35,35-50, 50+  100-500     Somerville
6   Mary Porter     jazz, rock  35-50   100-500, 500-100    Hendersonville

This is what I was able to

import pandas as pd

dataframe to test

df = pd.DataFrame({'name':['Alex Smith','Mike Stevens','Brenda West','Holy Kent','Bob West','Sally May','Mary Porter','John Keys'], 
                   'genre': ['rock','pop, rock','jazz',"dj",'rock','pop','jazz, rock',"dj"],'demo':['18-25,25-35,35-50','18-25,25-35,35-50, 50+','35-50','18-25','18-25,25-35,35-50','18-25,35-50, 50+','35-50','18-25'],
                   'price':['100-500','100-500','100-500, 500-100','1000+','100-500','100-500', '100-500, 500-100','1000+'],
                   'city':['Gainesville','Somerville','Hendersonville','Pluto','Gainesville','Somerville','Hendersonville','Pluto']})
df.head(10)

     name   genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
1   Mike Stevens    pop, rock   18-25,25-35,35-50, 50+  100-500     Somerville
2   Brenda West     jazz    35-50   100-500, 500-100    Hendersonville
3   Holy Kent   dj  18-25   1000+   Pluto
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville
5   Sally May   pop     18-25,35-50, 50+    100-500     Somerville
6   Mary Porter     jazz, rock  35-50   100-500, 500-100    Hendersonville
7   John Keys   dj  18-25   1000+   Pluto

This gives me the result of the keyword rock in genre and demo of 30-50

# This gives me the result of the keyword rock in genre and demo of 30-50

df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50'))]

    name   genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
1   Mike Stevens    pop, rock   18-25,25-35,35-50, 50+  100-500     Somerville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville
6   Mary Porter     jazz, rock  35-50   100-500, 500-100    Hendersonville
**This gives me the result of the keyword rock in genre and demo of 30-50**

# This limits the results to 3 of the keyword in genre and demo of 30-50

df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50'))].head(3)


   name    genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
1   Mike Stevens    pop, rock   18-25,25-35,35-50, 50+  100-500     Somerville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville

This gives me results for rock with demo 30-50 sorted by city

# best results for rock with demo 30-50 sorted by city by 3 

df[(df.genre.str.contains('rock')) & (df.demo.str.contains('35-50')) & (df.city.str.contains('Gainesville'))].head(3)


    name   genre   demo    price   city
0   Alex Smith  rock    18-25,25-35,35-50   100-500     Gainesville
4   Bob West    rock    18-25,25-35,35-50   100-500     Gainesville

Upvotes: 0

Views: 44

Answers (1)

BENY
BENY

Reputation: 323226

Check with or not and

df[(df.genre.str.contains('rock')) | (df.demo.str.contains('35-50')) | (df.city.str.contains('Gainesville'))].head(4)

Upvotes: 2

Related Questions