Reputation: 3792
I have the following series:
s = pd.Series([['a', 'b'], ['c', 'd'], ['f', 'g']])
>>> s
0 [a, b]
1 [c, d]
2 [f, g]
dtype: object
what is the easiest - preferably vectorized - way to concatenate all lists in the series, so that I get:
l = ['a', 'b', 'c', 'd', 'f', 'g']
Thanks!
Upvotes: 12
Views: 9852
Reputation: 36043
I'm not timing or testing these options, but there's the new pandas method explode
, and also numpy.concatenate
.
Upvotes: 2
Reputation: 109696
A nested list comprehension should be much faster.
>>> [element for list_ in s for element in list_]
['a', 'b', 'c', 'd', 'f', 'g']
>>> %timeit -n 100000 [element for list_ in s for element in list_]
100000 loops, best of 3: 5.2 µs per loop
>>> %timeit -n 100000 s.sum()
100000 loops, best of 3: 50.7 µs per loop
Directly accessing the values of the list is even faster.
>>> %timeit -n 100000 [element for list_ in s.values for element in list_]
100000 loops, best of 3: 2.77 µs per loop
Upvotes: 16