Reputation: 11114
Suppose I have a dataframe, d
which has a column containing Python arrays as the values.
>>> d = pd.DataFrame([['foo', ['bar']], ['biz', []]], columns=['a','b'])
>>> print d
a b
0 foo [bar]
1 biz []
Now, I want to filter out those rows which have empty arrays.
I have tried various versions, but no luck so far:
Trying to check it as a 'truthy' value:
>>> d[d['b']]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2682, in __getitem__
return self._getitem_array(key)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2726, in _getitem_array
indexer = self.loc._convert_to_indexer(key, axis=1)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/indexing.py", line 1314, in _convert_to_indexer
indexer = check = labels.get_indexer(objarr)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 3259, in get_indexer
indexer = self._engine.get_indexer(target._ndarray_values)
File "pandas/_libs/index.pyx", line 301, in pandas._libs.index.IndexEngine.get_indexer
File "pandas/_libs/hashtable_class_helper.pxi", line 1544, in pandas._libs.hashtable.PyObjectHashTable.lookup
TypeError: unhashable type: 'list'
Trying an explicit length check. It seems len()
is being applied to the series, not the value of the data.
>>> d[ len(d['b']) > 0 ]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2688, in __getitem__
return self._getitem_column(key)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/frame.py", line 2695, in _getitem_column
return self._get_item_cache(key)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/generic.py", line 2489, in _get_item_cache
values = self._data.get(item)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/internals.py", line 4115, in get
loc = self.items.get_loc(item)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/indexes/base.py", line 3080, in get_loc
return self._engine.get_loc(self._maybe_cast_indexer(key))
File "pandas/_libs/index.pyx", line 140, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/index.pyx", line 162, in pandas._libs.index.IndexEngine.get_loc
File "pandas/_libs/hashtable_class_helper.pxi", line 1492, in pandas._libs.hashtable.PyObjectHashTable.get_item
File "pandas/_libs/hashtable_class_helper.pxi", line 1500, in pandas._libs.hashtable.PyObjectHashTable.get_item
KeyError: True
Comparing to empty array directly, just as we might compare to an empty string (which, by the way, does work, if we use strings rather than arrays).
>>> d[ d['b'] == [] ]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/ops.py", line 1283, in wrapper
res = na_op(values, other)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/ops.py", line 1143, in na_op
result = _comp_method_OBJECT_ARRAY(op, x, y)
File "/home/myname/.local/lib/python2.7/site-packages/pandas/core/ops.py", line 1120, in _comp_method_OBJECT_ARRAY
result = libops.vec_compare(x, y, op)
File "pandas/_libs/ops.pyx", line 128, in pandas._libs.ops.vec_compare
ValueError: Arrays were different lengths: 2 vs 0
Upvotes: 5
Views: 8765
Reputation: 11114
Scott's answer is better, but just for others' knowledge, another option is to use a tuple rather than a list, and check against an empty tuple directly.
d[d['b'] != ()]
Which gives:
a b
0 foo (bar,)
This doesn't work with lists; see the last error in the original question.
Upvotes: 0
Reputation: 51165
Empty lists will evaluate to False
using all
. This will not work if you have other Falsey values in a row (unless you want to drop those rows as well).
d[d.all(1)]
a b
0 foo [bar]
If you only want to filter using column b
, you can use astype
:
d[d.b.astype(bool)]
a b
0 foo [bar]
Upvotes: 5
Reputation: 153460
Use the string accessor, .str
to check the length of list in pandas series:
d[d.b.str.len()>0]
Output:
a b
0 foo [bar]
Upvotes: 8