Blaszard
Blaszard

Reputation: 31965

How to check if an element is an empty list in pandas?

One of the column in my df stores a list, and some of the raws have empty items in the list. For example:

[]

["X", "Y"]

[]

etc...

How can only take the raw whose list is not empty?

The following code does not work.

df[df["col"] != []] # ValueError: Lengths must match to compare
df[pd.notnull(df["col"])] # The code doesn't issue an error but the result includes an empty list
df[len(df["col"]) != 0] # KeyError: True

Upvotes: 32

Views: 38758

Answers (6)

mozway
mozway

Reputation: 260600

df['col'] == [] doesn't work since passing a list in the right hand side will force pandas to try an element-wise comparison (nth item of df['col'] vs nth item of the list).

pd.Series(['a', 'b', 'c']) == [1, 'b', 3]

0    False
1     True   # second item matches
2    False
dtype: bool

To avoid this, wrap the list to test in a list and use isin:

df['col'].isin([[]])

To filter:

out = df[df['col'].isin([[]])]

Example:

df = pd.DataFrame({'col': ['a', ['b'], []]})

df['col'].isin([[]])

# 0    False
# 1    False
# 2     True
# Name: col, dtype: bool

df['col'].isin([['b']])

# 0    False
# 1     True
# 2    False
# Name: col, dtype: bool

df[df['col'].isin([[]])]

#   col
# 2  []

Upvotes: 0

piRSquared
piRSquared

Reputation: 294258

bool

An empty list in a boolean context is False. An empty list is what we call falsey. It does a programmer well to know what objects are falsey and truthy.

You can also slice a dataframe with a boolean list (not just a boolean series). And so, I'll use a comprehension to speed up the checking.

df[[bool(x) for x in df.col]]

Or with even fewer characters

df[[*map(bool, df.col)]]

Upvotes: 4

GZ0
GZ0

Reputation: 4263

This is probably the most efficient solution.

df[df["col"].astype(bool)]

Upvotes: 34

41 72 6c
41 72 6c

Reputation: 1780

You could check to see if the lists are empty using str.len() and then negate:

df[df["col"].str.len() != 0]
...

str.len is for the Python built-in function returning the length of an object.

And your output should be the expected one.

Upvotes: 0

Quang Hoang
Quang Hoang

Reputation: 150735

Try this:

df[df['col'].apply(len).gt(0)]

Upvotes: 18

javidcf
javidcf

Reputation: 59701

You can do this:

df[df["col"].str.len() != 0]

Example:

import pandas as pd

df = pd.DataFrame({"col": [[1], [2, 3], [], [4, 5, 6], []]}, dtype=object)
print(df[df["col"].str.len() != 0])
#          col
# 0        [1]
# 1     [2, 3]
# 3  [4, 5, 6]

Upvotes: 44

Related Questions