The Unfun Cat
The Unfun Cat

Reputation: 31898

Remove empty lists in pandas series

I have a long series like the following:

series = pd.Series([[(1,2)],[(3,5)],[],[(3,5)]])

In [151]: series
Out[151]:
0    [(1, 2)]
1    [(3, 5)]
2          []
3    [(3, 5)]
dtype: object

I want to remove all entries with an empty list. For some reason, boolean indexing does not work.

The following tests both give the same error:

series == [[(1,2)]]
series == [(1,2)]

ValueError: Arrays were different lengths: 4 vs 1

This is very strange, because in the simple example below, indexing works just like above:

In [146]: pd.Series([1,2,3]) == [3]
Out[146]:
0    False
1    False
2     True
dtype: bool

P.S. ideally, I'd like to split the tuples in the series into a DataFrame of two columns also.

Upvotes: 8

Views: 15528

Answers (3)

Meow
Meow

Reputation: 1267

Using the built in apply you can filter by the length of the list:

series = pd.Series([[(1,2)],[(3,5)],[],[(3,5)]])
series = series[series.apply(len) > 0]

Upvotes: 6

unutbu
unutbu

Reputation: 879073

Your series is in a bad state -- having a Series of lists of tuples of ints buries the useful data, the ints, inside too many layers of containers.

However, to form the desired DataFrame, you could use

df = series.apply(lambda x: pd.Series(x[0]) if x else pd.Series()).dropna()

which yields

   0  1
0  1  2
1  3  5
2  3  5

A better way would be to avoid building the malformed series altogether and form df directly from the data:

data = [[(1,2)],[(3,5)],[],[(3,5)]]
data = [pair for row in data for pair in row]
df = pd.DataFrame(data)

Upvotes: 4

Alex Riley
Alex Riley

Reputation: 176730

You could check to see if the lists are empty using str.len():

series.str.len() == 0

and then use this boolean series to remove the rows containing empty lists.

If each of your entries is a list containing a two-tuple (or else empty), you could create a two-column DataFrame by using the str accessor twice (once to select the first element of the list, then to access the elements of the tuple):

pd.DataFrame({'a': series.str[0].str[0], 'b': series.str[0].str[1]})

Missing entries default to NaN with this method.

Upvotes: 19

Related Questions