How can I split Pandas arrays into columns?

I'm trying to split array values to columns.

I've created a Google Colab notebook and you can find my code here.

Here is a screenshot of the data (Hashtags):

Here is a representation of the data.

    codes
1   [71020]
2   [77085]
3   [36415]
4   [99213, 99287]
5   [99233, 99233, 99233]

I want to split this arrays into different columns.

To something like this (screenshot - Hashtags split to columns):

Here is a representation of it.

                   code_1      code_2      code_3   
1                  71020
2                  77085
3                  36415
4                  99213       99287
5                  99233       99233       99233

I tried the following code which I got form this Stack Overflow post, but it doesn't give the expected results:

df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist())

What am I doing wrong?

Upvotes: 1

Views: 714

Answers (1)

user2246849
user2246849

Reputation: 4407

The reason is the lists are still stored as strings in the hashtags column when you read them with read_csv. You can convert them upon reading of the data (follwing code taken from the Colab notebook):

import pandas as pd
from ast import literal_eval

url = "https://raw.githubusercontent.com/hashimputhiyakath/datasets/main/hashtags10.csv"

# Notice the added converter to turn strings into lists.
df = pd.read_csv(url, converters={'hashtags': literal_eval})

And then the solution you mentioned will work as expected.

df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist(), index=df.index).add_prefix('hashtag_')
print(df_hashtags_splitted.head(10))
          hashtag_0     hashtag_1         hashtag_2       hashtag_3           hashtag_4       hashtag_5    hashtag_6         hashtag_7  hashtag_8       hashtag_9 hashtag_10 hashtag_11
0         longcovid     covidhelp              None            None                None            None         None              None       None            None       None       None
1            mumbai         covid      hospitalbeds  covidemergency           mahacovid       oxygenbed  mumbaicovid  covid19indiahelp  covidhelp  covidresources       None       None
2   kawahcoffeeshop   coffeelover             kawah       costarica            puravida         heredia       oxygen              None       None            None       None       None
3           lucknow        mumbai         hyderabad           delhi            verified  covidresources    covidhelp  covid19indiahelp       None            None       None       None
4            oxygen          None              None            None                None            None         None              None       None            None       None       None
5  covid19indiahelp        mahara              None            None                None            None         None              None       None            None       None       None
6            oxygen       amadoda              None            None                None            None         None              None       None            None       None       None
7  plasmadonordelhi  plasmamumbai  covid19indiahelp       covidhelp  covidemergency2021            None         None              None       None            None       None       None
8            oxygen  conservation           wilding       rewilding         environment  sustainability  restorative       agriculture   wildlife    biodiversity      water   wildswim
9             covid      verified            mumbai          oxygen  covidemergency2021         covid19    covidhelp    covidresources       None            None       None       None

Alternatively, to convert the lists to strings after you read the csv you can do:

df['hashtags'] = df['hashtags'].map(literal_eval)

Upvotes: 1

Related Questions