Manuel
Manuel

Reputation: 143

Import CSV with nested JSON into Pandas DataFrame

I'm trying to import a csv file with a two nested JSON Obejcts inside using Jupiter Notebook.

I'm getting this error.

ParserError: Error tokenizing data. C error: Expected 29 fields in line 3, saw 35

The problem is that Pandas doesn't recognise the JSON Object and just uses the CSV delimiters which is a comma.

Here's a sample row of the CSV file:

309,DVD10_Welt.mxf,16947519284,00:37:32:24,0_yd3ugljx,"{"Type":"Source","Content-Type":"Beitrag"}",97,"Welt",NULL,NULL,NULL,"{"ContentType":"Beitrag","Description":"Sie beobachten jeden.","Keywords":["wissensthek","zukunft","\u00dcberwachung","roboter","technik","internet","dvd","wissen"],"ProductionDate":"2013-07-10T00:30:06.000Z","TitleIntern":null}"

This is my Line in Jupyter:

df = pd.read_csv(csv_file)
df

Can someone pls give me a hint?

Thanks Manuel

Upvotes: 0

Views: 839

Answers (1)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210852

I don't think you can read it without preprocessing, because it's not a valid CSV file.

If you can save your CSV file properly quoted - it'll work.

Demo:

In [87]: df = pd.DataFrame({'ID':[1,2]})

In [88]: df['JSON'] = '{"Type":"Source","Content-Type":"Beitrag"}'

In [89]: df
Out[89]:
   ID                                        JSON
0   1  {"Type":"Source","Content-Type":"Beitrag"}
1   2  {"Type":"Source","Content-Type":"Beitrag"}

In [90]: df.to_csv('d:/temp/a.csv', index=False)

Resulting CSV:

ID,JSON
1,"{""Type"":""Source"",""Content-Type"":""Beitrag""}"
2,"{""Type"":""Source"",""Content-Type"":""Beitrag""}"

Check:

In [91]: pd.read_csv('d:/temp/a.csv')
Out[91]:
   ID                                        JSON
0   1  {"Type":"Source","Content-Type":"Beitrag"}
1   2  {"Type":"Source","Content-Type":"Beitrag"}

Upvotes: 1

Related Questions