Titan
Titan

Reputation: 23

How do i parse a json string in a csv column and break it down into multiple columns?

My objective was to read a json string located in the 4th column "REQUEST_RE" of a csv file and breakdown that 4th column into its own individual columns.

my data is in the following format per csv row on the 4th column:

Row 2: {"Fruit":"Apple","Cost":"1.5","Attributes":{"ID":"001","Country":"America"}}

Row 3: {"Fruit":"Orange","Cost":"2.0","Attributes":{"ID":"002","Country":"China"}}

to be changed into:

enter image description here

i was trying this: Parsing a JSON string which was loaded from a CSV using Pandas

and i ended up using this:

InterimReport = pd.read_csv(filename, index_col=False, usecols=lambda col: col not in ["SYSTEMID"]) InterimReport.join(InterimReport['REQUEST_RE'].apply(json.loads).apply(pd.Series))

but i was unable to split my json string into columns.

my json string still remained a json string and was unchanged.

Upvotes: 0

Views: 2140

Answers (1)

Serge Ballesta
Serge Ballesta

Reputation: 149085

You should load the CSV file ignoring the JSON string at this time.

Then you convert the column to a json list and normalize it:

tmp = pd.json_normalize(InterimReport['REQUEST_RE'].apply(json.loads).tolist()).rename(
          columns=lambda x: x.replace('Attributes.', ''))

You should get something like:

    Fruit Cost   ID  Country
0   Apple  1.5  001  America
1  Orange  2.0  002    China

That you can easily concat to the original dataframe:

InterimReport = pd.concat([InterimReport.drop(columns=['REQUEST_RE']), tmp], axis=1)

Upvotes: 1

Related Questions