James Bond
James Bond

Reputation: 7913

in python pandas, how to unpack the lists in a column?

i have a python data frame, which contains 3 columns:

['date', 'country', 'dollar']

country is a string, which normally looks like 'singapore' 'ukraine' etc

sometimes, country column's item is a list of countries, separated by |, such as

'US|UK|Germany'

the corresponding row will be:

20140101, 'US|UK|Germany', 123456

what i want is to 'unpack' the country column, and makes it strictly 1 country per row, the above row should be unpacked into 3 rows:

20140101, 'US', 123456
20140101, 'UK', 123456
20140101, 'Germany', 123456

is there any neat way to do this?

thanks!

Upvotes: 6

Views: 15787

Answers (3)

Pablo
Pablo

Reputation: 300

Use explode

df = df.explode('country')

Upvotes: 10

CT Zhu
CT Zhu

Reputation: 54340

This solution will change the order of you columns, which I think is fine in most cases. You can replace dict with OrderedDict if you want to preserve the column orders.

In [31]:
print DF
       date        country  dollar
0  20140101  US|UK|Germany  123456
1  20140101  US|UK|Germany  123457

[2 rows x 3 columns]
In [32]:

DF.country=DF.country.apply(lambda x: x.split('|'))
print DF
       date            country  dollar
0  20140101  [US, UK, Germany]  123456
1  20140101  [US, UK, Germany]  123457

[2 rows x 3 columns]
In [33]:

print pd.concat([pd.DataFrame(dict(zip(DF.columns,DF.ix[i]))) for i in range(len(DF))])
   country      date  dollar
0       US  20140101  123456
1       UK  20140101  123456
2  Germany  20140101  123456
0       US  20140101  123457
1       UK  20140101  123457
2  Germany  20140101  123457

[6 rows x 3 columns]

Upvotes: 7

vminof
vminof

Reputation: 121

Here you go:

a = [20140101, 'US|UK|Germany', 123456]
[[a[0], country, a[2]] for country in a[1].split('|')]


[[20140101, 'US', 123456],
 [20140101, 'UK', 123456],
 [20140101, 'Germany', 123456]]

Upvotes: 1

Related Questions