Reputation: 497
I need to limit a dataset so that it returns only rows that contain specific string, however, that string can exist in many (8) of the columns.
How can I do this? Ive seen str.isin methods, but it returns a single series for a single row. How can I remove any rows that contain the string in ANY of the columns.
Example code If I had the dataframe df generated by
import pandas as pd
data = {'year': [2011, 2012, 2013, 2014, 2014, 2011, 2012, 2015],
'year2': [2012, 2016, 2015, 2015, 2012, 2013, 2019, 2016],
'reports': [52, 20, 43, 33, 41, 11, 43, 72]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df
year year2 reports
a 2011 2012 52
b 2012 2016 20
c 2013 2015 43
d 2014 2015 33
e 2014 2012 41
f 2011 2013 11
g 2012 2019 43
h 2015 2016 72
I want the code to remove rows all rows that do not contain the value 2012. Note that in my actual dataset, it is a string, not an int (it is peoples names)
so in the above code it would remove rows c, d, f, and h.
Upvotes: 2
Views: 4662
Reputation: 78
To find the dataframe made of the rows that have the value
df[(df == '2012').all(axis=1)]
To find the dataframe made of the rows that do not have the value
df[~(df == '2012').all(axis=1)]
or
df[(df != '2012').all(axis=1)]
See the related https://stackoverflow.com/a/35682788/12411517.
Upvotes: 0
Reputation: 75080
you can use df.eq
with df.any
on axis=1
:
df[df.eq('2012').any(1)] #for year as string
Or:
df[df.eq(2012).any(1)] #for year as int
year year2 reports
a 2011 2012 52
b 2012 2016 20
e 2014 2012 41
g 2012 2019 43
Upvotes: 9
Reputation: 13
try simple code like this:
import pandas as pd
data = {'year': [2011, 2012, 2013, 2014, 2014, 2011, 2012, 2015],
'year2': [2012, 2016, 2015, 2015, 2012, 2013, 2019, 2016],
'reports': [52, 20, 43, 33, 41, 11, 43, 72]}
df = pd.DataFrame(data, index = ['a', 'b', 'c', 'd', 'e', 'f', 'g', 'h'])
df = df.drop(['c', 'd', 'f', 'h'])
df
it will give you dataframe like this:
year year2 reports
a 2011 2012 52
b 2012 2016 20
e 2014 2012 41
g 2012 2019 43
Upvotes: 0