Alejandro Simkievich
Alejandro Simkievich

Reputation: 3792

efficient concatenation of lists in pandas series

I have the following series:

s = pd.Series([['a', 'b'], ['c', 'd'], ['f', 'g']])
>>> s
0    [a, b]
1    [c, d]
2    [f, g]
dtype: object

what is the easiest - preferably vectorized - way to concatenate all lists in the series, so that I get:

l = ['a', 'b', 'c', 'd', 'f', 'g']

Thanks!

Upvotes: 12

Views: 9852

Answers (2)

Alex Hall
Alex Hall

Reputation: 36043

I'm not timing or testing these options, but there's the new pandas method explode, and also numpy.concatenate.

Upvotes: 2

Alexander
Alexander

Reputation: 109696

A nested list comprehension should be much faster.

>>> [element for list_ in s for element in list_]
    ['a', 'b', 'c', 'd', 'f', 'g']

>>> %timeit -n 100000 [element for list_ in s for element in list_]
100000 loops, best of 3: 5.2 µs per loop

>>> %timeit -n 100000 s.sum()
100000 loops, best of 3: 50.7 µs per loop

Directly accessing the values of the list is even faster.

>>> %timeit -n 100000 [element for list_ in s.values for element in list_]
100000 loops, best of 3: 2.77 µs per loop

Upvotes: 16

Related Questions