Reputation: 17164
I have a pandas series with elements as list:
import pandas as pd
s = pd.Series([ ['United States of America'],['China', 'Hong Kong'], []])
print(s)
0 [United States of America]
1 [China, Hong Kong]
2 []
How to get a series like the following:
0 United States of America
1 China
1 Hong Kong
I am not sure about what happens to 2.
Upvotes: 4
Views: 258
Reputation: 21
There is a simpler and probably way less computationally expensive to do that through pandas function explode
. See at here. In your case, the answer would be:
s.explode()
Simple as it is! In a case with more columns you can specify which one you would like to "explode" by adding the name of it in literals, for example s.explode('country')
.
Upvotes: 1
Reputation: 402603
The following options all return Series. Create a new frame and listify.
pd.DataFrame(s.tolist()).stack()
0 0 United States of America
1 0 China
1 Hong Kong
dtype: object
To reset the index, use
pd.DataFrame(s.tolist()).stack().reset_index(drop=True)
0 United States of America
1 China
2 Hong Kong
dtype: object
To convert to DataFrame, call to_frame()
pd.DataFrame(s.tolist()).stack().reset_index(drop=True).to_frame('countries')
countries
0 United States of America
1 China
2 Hong Kong
If you're trying to code golf, use
sum(s, [])
# ['United States of America', 'China', 'Hong Kong']
pd.Series(sum(s, []))
0 United States of America
1 China
2 Hong Kong
dtype: object
Or even,
pd.Series(np.sum(s))
0 United States of America
1 China
2 Hong Kong
dtype: object
However, like most other operations involving sums of lists operations, this is bad in terms of performance (list concatenation operations are inefficient).
Faster operations are possible using chaining with itertools.chain
:
from itertools import chain
pd.Series(list(chain.from_iterable(s)))
0 United States of America
1 China
2 Hong Kong
dtype: object
pd.DataFrame(list(chain.from_iterable(s)), columns=['countries'])
countries
0 United States of America
1 China
2 Hong Kong
Upvotes: 4
Reputation: 323306
Assuming that is list
pd.Series(s.sum())
Out[103]:
0 United States of America
1 China
2 Hong Kong
dtype: object
Upvotes: 2
Reputation: 71580
Or use:
df = pd.DataFrame(s.tolist())
print(df[0].fillna(df[1].dropna().item()))
Output:
0 United States of America
1 China
2 Hong Kong
Name: 0, dtype: object
Upvotes: 2