Reputation: 143
I'm trying to import a csv file with a two nested JSON Obejcts inside using Jupiter Notebook.
I'm getting this error.
ParserError: Error tokenizing data. C error: Expected 29 fields in line 3, saw 35
The problem is that Pandas doesn't recognise the JSON Object and just uses the CSV delimiters which is a comma.
Here's a sample row of the CSV file:
309,DVD10_Welt.mxf,16947519284,00:37:32:24,0_yd3ugljx,"{"Type":"Source","Content-Type":"Beitrag"}",97,"Welt",NULL,NULL,NULL,"{"ContentType":"Beitrag","Description":"Sie beobachten jeden.","Keywords":["wissensthek","zukunft","\u00dcberwachung","roboter","technik","internet","dvd","wissen"],"ProductionDate":"2013-07-10T00:30:06.000Z","TitleIntern":null}"
This is my Line in Jupyter:
df = pd.read_csv(csv_file)
df
Can someone pls give me a hint?
Thanks Manuel
Upvotes: 0
Views: 839
Reputation: 210852
I don't think you can read it without preprocessing, because it's not a valid CSV file.
If you can save your CSV file properly quoted - it'll work.
Demo:
In [87]: df = pd.DataFrame({'ID':[1,2]})
In [88]: df['JSON'] = '{"Type":"Source","Content-Type":"Beitrag"}'
In [89]: df
Out[89]:
ID JSON
0 1 {"Type":"Source","Content-Type":"Beitrag"}
1 2 {"Type":"Source","Content-Type":"Beitrag"}
In [90]: df.to_csv('d:/temp/a.csv', index=False)
Resulting CSV:
ID,JSON
1,"{""Type"":""Source"",""Content-Type"":""Beitrag""}"
2,"{""Type"":""Source"",""Content-Type"":""Beitrag""}"
Check:
In [91]: pd.read_csv('d:/temp/a.csv')
Out[91]:
ID JSON
0 1 {"Type":"Source","Content-Type":"Beitrag"}
1 2 {"Type":"Source","Content-Type":"Beitrag"}
Upvotes: 1