Reputation: 23
My objective was to read a json string located in the 4th column "REQUEST_RE" of a csv file and breakdown that 4th column into its own individual columns.
my data is in the following format per csv row on the 4th column:
Row 2: {"Fruit":"Apple","Cost":"1.5","Attributes":{"ID":"001","Country":"America"}}
Row 3: {"Fruit":"Orange","Cost":"2.0","Attributes":{"ID":"002","Country":"China"}}
to be changed into:
i was trying this: Parsing a JSON string which was loaded from a CSV using Pandas
and i ended up using this:
InterimReport = pd.read_csv(filename, index_col=False, usecols=lambda col: col not in ["SYSTEMID"]) InterimReport.join(InterimReport['REQUEST_RE'].apply(json.loads).apply(pd.Series))
but i was unable to split my json string into columns.
my json string still remained a json string and was unchanged.
Upvotes: 0
Views: 2140
Reputation: 149085
You should load the CSV file ignoring the JSON string at this time.
Then you convert the column to a json list and normalize it:
tmp = pd.json_normalize(InterimReport['REQUEST_RE'].apply(json.loads).tolist()).rename(
columns=lambda x: x.replace('Attributes.', ''))
You should get something like:
Fruit Cost ID Country
0 Apple 1.5 001 America
1 Orange 2.0 002 China
That you can easily concat to the original dataframe:
InterimReport = pd.concat([InterimReport.drop(columns=['REQUEST_RE']), tmp], axis=1)
Upvotes: 1