Anil Tiwari
Anil Tiwari

Reputation: 81

split json within dataframe column into multiple column in python

actually I am fetching data from database which has data as below CSV format

DS,SID,SID_T,E_DATE,S_DATA
TECH,312,TID,2021-01-03,"{""idx"":""msc"",""cid"":""f323d3"",""iname"":""master_in_science"",""mcap"":21.33,""sg"":[{""upt"":true,""dwt"":true,""high_low"":false}]}"
TECH,343,TID,2021-01-03,"{""idx"":""bsc"",""cid"":""k33d3"",""iname"":""bachelor_in_science"",""mcap"":81.33,""sg"":[{""upt"":false,""dwt"":true,""high_low"":false}]}"
TECH,554,TID,2021-01-03,"{""idx"":""ba"",""cid"":""3d3f32"",""iname"":""bachelor_in_art"",""mcap"":67.83,""sg"":[{""upt"":true,""dwt"":false,""high_low"":false}]}"
TECH,323,TID,2021-01-03,"{""idx"":""ma"",""cid"":""m23k66"",""iname"":""master_in_art"",""mcap"":97.13,""sg"":[{""upt"":true,""dwt"":true,""high_low"":true}]}"

dataframe look like this enter image description here

i wanted to split the S_DATA column into multiple column hence the output is like this enter image description here

what i have tried i tried to convert the dataframe into json + and trying to normalize the json using pandas.normalize

but i was unable to do the so

also S_DATA.sg values i.e.""sg"":[{""upt"":true,""dwt"":true,""high_low"":false} is also creating the trouble while the entire conversion process

Upvotes: 0

Views: 1053

Answers (1)

Rodrigo Fernando Dias
Rodrigo Fernando Dias

Reputation: 41

Try this:

import pandas as pd
import json

df = pd.read_csv('data.csv')
df['S_DATA'] = df['S_DATA'].apply(lambda x: json.loads(x))
pd.concat([df[df.columns.difference(['S_DATA'])], pd.json_normalize(df.S_DATA)], axis=1)

Upvotes: 1

Related Questions