Reputation: 11
I'm trying to split array values to columns.
I've created a Google Colab notebook and you can find my code here.
Here is a screenshot of the data (Hashtags):
Here is a representation of the data.
codes
1 [71020]
2 [77085]
3 [36415]
4 [99213, 99287]
5 [99233, 99233, 99233]
I want to split this arrays into different columns.
To something like this (screenshot - Hashtags split to columns):
Here is a representation of it.
code_1 code_2 code_3
1 71020
2 77085
3 36415
4 99213 99287
5 99233 99233 99233
I tried the following code which I got form this Stack Overflow post, but it doesn't give the expected results:
df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist())
What am I doing wrong?
Upvotes: 1
Views: 714
Reputation: 4407
The reason is the lists are still stored as strings in the hashtags
column when you read them with read_csv
. You can convert them upon reading of the data (follwing code taken from the Colab notebook):
import pandas as pd
from ast import literal_eval
url = "https://raw.githubusercontent.com/hashimputhiyakath/datasets/main/hashtags10.csv"
# Notice the added converter to turn strings into lists.
df = pd.read_csv(url, converters={'hashtags': literal_eval})
And then the solution you mentioned will work as expected.
df_hashtags_splitted = pd.DataFrame(df['hashtags'].tolist(), index=df.index).add_prefix('hashtag_')
print(df_hashtags_splitted.head(10))
hashtag_0 hashtag_1 hashtag_2 hashtag_3 hashtag_4 hashtag_5 hashtag_6 hashtag_7 hashtag_8 hashtag_9 hashtag_10 hashtag_11
0 longcovid covidhelp None None None None None None None None None None
1 mumbai covid hospitalbeds covidemergency mahacovid oxygenbed mumbaicovid covid19indiahelp covidhelp covidresources None None
2 kawahcoffeeshop coffeelover kawah costarica puravida heredia oxygen None None None None None
3 lucknow mumbai hyderabad delhi verified covidresources covidhelp covid19indiahelp None None None None
4 oxygen None None None None None None None None None None None
5 covid19indiahelp mahara None None None None None None None None None None
6 oxygen amadoda None None None None None None None None None None
7 plasmadonordelhi plasmamumbai covid19indiahelp covidhelp covidemergency2021 None None None None None None None
8 oxygen conservation wilding rewilding environment sustainability restorative agriculture wildlife biodiversity water wildswim
9 covid verified mumbai oxygen covidemergency2021 covid19 covidhelp covidresources None None None None
Alternatively, to convert the lists to strings after you read the csv you can do:
df['hashtags'] = df['hashtags'].map(literal_eval)
Upvotes: 1