Reputation: 497

How can one filter a dataframe based on rows containing specific value (in any of the columns)

I need to limit a dataset so that it returns only rows that contain specific string, however, that string can exist in many (8) of the columns.

How can I do this? Ive seen str.isin methods, but it returns a single series for a single row. How can I remove any rows that contain the string in ANY of the columns.

Example code If I had the dataframe df generated by

 import pandas as pd
    data = {'year': [2011, 2012, 2013, 2014, 2014, 2011, 2012, 2015], 
        'year2': [2012, 2016, 2015, 2015, 2012, 2013, 2019, 2016],
        'reports': [52, 20, 43, 33, 41, 11, 43, 72]}
    df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
    df    

   year  year2  reports
a  2011   2012       52
b  2012   2016       20
c  2013   2015       43
d  2014   2015       33
e  2014   2012       41
f  2011   2013       11
g  2012   2019       43
h  2015   2016       72

I want the code to remove rows all rows that do not contain the value 2012. Note that in my actual dataset, it is a string, not an int (it is peoples names) so in the above code it would remove rows c, d, f, and h.

Upvotes: 2

Answers (3)

sd191028

Reputation: 78

To find the dataframe made of the rows that have the value

df[(df == '2012').all(axis=1)]

To find the dataframe made of the rows that do not have the value

df[~(df == '2012').all(axis=1)]

df[(df != '2012').all(axis=1)]

Upvotes: 0

anky

Reputation: 75080

you can use df.eq with df.any on axis=1:

df[df.eq('2012').any(1)] #for year as string

Or:

df[df.eq(2012).any(1)] #for year as int

   year  year2  reports
a  2011   2012       52
b  2012   2016       20
e  2014   2012       41
g  2012   2019       43

Upvotes: 9

Zakiul Fahmi Jailani

Reputation: 13

try simple code like this:

import pandas as pd
data = {'year': [2011, 2012, 2013, 2014, 2014, 2011, 2012, 2015], 
'year2': [2012, 2016, 2015, 2015, 2012, 2013, 2019, 2016],
'reports': [52, 20, 43, 33, 41, 11, 43, 72]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df = df.drop(['c', 'd', 'f', 'h'])

df

it will give you dataframe like this:

   year  year2  reports
a  2011   2012       52
b  2012   2016       20
e  2014   2012       41
g  2012   2019       43

Upvotes: 0

How can one filter a dataframe based on rows containing specific value (in any of the columns)

Answers (3)

Related Questions