Reputation: 31965
One of the column in my df stores a list, and some of the raws have empty items in the list. For example:
[]
["X", "Y"]
[]
etc...
How can only take the raw whose list is not empty?
The following code does not work.
df[df["col"] != []] # ValueError: Lengths must match to compare
df[pd.notnull(df["col"])] # The code doesn't issue an error but the result includes an empty list
df[len(df["col"]) != 0] # KeyError: True
Upvotes: 32
Views: 38758
Reputation: 260600
df['col'] == []
doesn't work since passing a list in the right hand side will force pandas to try an element-wise comparison (nth item of df['col']
vs nth item of the list).
pd.Series(['a', 'b', 'c']) == [1, 'b', 3]
0 False
1 True # second item matches
2 False
dtype: bool
To avoid this, wrap the list to test in a list and use isin
:
df['col'].isin([[]])
To filter:
out = df[df['col'].isin([[]])]
Example:
df = pd.DataFrame({'col': ['a', ['b'], []]})
df['col'].isin([[]])
# 0 False
# 1 False
# 2 True
# Name: col, dtype: bool
df['col'].isin([['b']])
# 0 False
# 1 True
# 2 False
# Name: col, dtype: bool
df[df['col'].isin([[]])]
# col
# 2 []
Upvotes: 0
Reputation: 294258
bool
An empty list in a boolean context is False
. An empty list is what we call falsey. It does a programmer well to know what objects are falsey and truthy.
You can also slice a dataframe with a boolean list (not just a boolean series). And so, I'll use a comprehension to speed up the checking.
df[[bool(x) for x in df.col]]
Or with even fewer characters
df[[*map(bool, df.col)]]
Upvotes: 4
Reputation: 4263
This is probably the most efficient solution.
df[df["col"].astype(bool)]
Upvotes: 34
Reputation: 1780
You could check to see if the lists are empty using str.len() and then negate:
df[df["col"].str.len() != 0]
...
str.len
is for the Python built-in function returning the length of an object.
And your output should be the expected one.
Upvotes: 0
Reputation: 59701
You can do this:
df[df["col"].str.len() != 0]
Example:
import pandas as pd
df = pd.DataFrame({"col": [[1], [2, 3], [], [4, 5, 6], []]}, dtype=object)
print(df[df["col"].str.len() != 0])
# col
# 0 [1]
# 1 [2, 3]
# 3 [4, 5, 6]
Upvotes: 44