Reputation: 637
I'm having an issue with pandas 19.2 giving me the result I expect. The columns a-g have ['yes','no','', NaN]. If any of these columns have 'yes' I want the row returned (there are other columns not shown). Here is my code.
xdf2 = xdf[((xdf['a'] == 'yes').all() or
(xdf['b'] == 'yes').all() or
(xdf['c'] == 'yes').all() or
(xdf['d'] == 'yes' ).all() or
(xdf['e'] == 'yes').all() or
(xdf['f'] == 'yes').all() or
(xdf['g'] =='yes').all()) ]
This gives me the following error:
2134 return self._engine.get_loc(key)
2135 except KeyError:
-> 2136 return self._engine.get_loc(self._maybe_cast_indexer(key))
2137
2138 indexer = self.get_indexer([key], method=method, tolerance=tolerance)
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4433)()
pandas\index.pyx in pandas.index.IndexEngine.get_loc (pandas\index.c:4279)()
pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13742)()
pandas\src\hashtable_class_helper.pxi in pandas.hashtable.PyObjectHashTable.get_item (pandas\hashtable.c:13696)()
KeyError: False
Without the '.all' I get
ValueError: The truth value of a Series is ambiguous. Use a.empty, a.bool(), a.item(), a.any() or a.all().
This seems like a simple and common code snippet, but I haven't found a good example. What am I missing?
Upvotes: 1
Views: 241
Reputation: 339795
The following should work:
import pandas as pd
a = [["yes", "no", "yes", "yes"],
["yes", "yes", "no", "yes"],
["yes", "no", "yes", "yes"]]
xdf = pd.DataFrame(a, columns=["a", "b", "c", "d"])
print xdf
boollist = [ (xdf[col] == "yes").all() for col in xdf.columns ]
xdf2 = xdf[xdf.columns[boollist] ]
print xdf2
Upvotes: 0