BhishanPoudel
BhishanPoudel

Reputation: 17164

Create stacked pandas series from series with list elements

I have a pandas series with elements as list:

import pandas as pd
s = pd.Series([ ['United States of America'],['China', 'Hong Kong'], []])
print(s)

0    [United States of America]
1            [China, Hong Kong]
2                            []

How to get a series like the following:

0 United States of America
1 China
1 Hong Kong

I am not sure about what happens to 2.

Upvotes: 4

Views: 258

Answers (4)

Calebe Piacentini
Calebe Piacentini

Reputation: 21

There is a simpler and probably way less computationally expensive to do that through pandas function explode. See at here. In your case, the answer would be:

s.explode()

Simple as it is! In a case with more columns you can specify which one you would like to "explode" by adding the name of it in literals, for example s.explode('country').

Upvotes: 1

cs95
cs95

Reputation: 402603

The following options all return Series. Create a new frame and listify.

pd.DataFrame(s.tolist()).stack()

0  0    United States of America
1  0                       China
   1                   Hong Kong
dtype: object

To reset the index, use

pd.DataFrame(s.tolist()).stack().reset_index(drop=True)

0    United States of America
1                       China
2                   Hong Kong
dtype: object

To convert to DataFrame, call to_frame()

pd.DataFrame(s.tolist()).stack().reset_index(drop=True).to_frame('countries')

                  countries
0  United States of America
1                     China
2                 Hong Kong

If you're trying to code golf, use

sum(s, [])
# ['United States of America', 'China', 'Hong Kong']

pd.Series(sum(s, []))

0    United States of America
1                       China
2                   Hong Kong
dtype: object

Or even,

pd.Series(np.sum(s))

0    United States of America
1                       China
2                   Hong Kong
dtype: object

However, like most other operations involving sums of lists operations, this is bad in terms of performance (list concatenation operations are inefficient).


Faster operations are possible using chaining with itertools.chain:

from itertools import chain
pd.Series(list(chain.from_iterable(s)))

0    United States of America
1                       China
2                   Hong Kong
dtype: object

pd.DataFrame(list(chain.from_iterable(s)), columns=['countries'])

                  countries
0  United States of America
1                     China
2                 Hong Kong

Upvotes: 4

BENY
BENY

Reputation: 323306

Assuming that is list

pd.Series(s.sum())
Out[103]: 
0    United States of America
1                       China
2                   Hong Kong
dtype: object

Upvotes: 2

U13-Forward
U13-Forward

Reputation: 71580

Or use:

df = pd.DataFrame(s.tolist())
print(df[0].fillna(df[1].dropna().item()))

Output:

0    United States of America
1                       China
2                   Hong Kong
Name: 0, dtype: object

Upvotes: 2

Related Questions