Reputation: 18810
Dealing with csv file that has text data of novels.
book_id, title, content
1, book title 1, All Passion Spent is written in three parts, primarily from the view of an intimate observer.
2, Book Title 2, In particular Mr FitzGeorge, a forgotten acquaintance from India who has ever since been in love with her, introduces himself and they form a quiet but playful and understanding friendship. It cost 3,4234 to travel.
Text in content column have commas and unfortunately when you try to use pandas.read_csv you get pandas.errors.ParserError: Error tokenizing data. C error:
There are some solutions to this problem SO but none of them worked. Tried to read as a regular file and then passed to data frame failed. SO - Solution
Upvotes: 1
Views: 550
Reputation: 82765
You can try reading your file and then spliting the content using str.split(",", 2)
and then convert the result to a DF.
Ex:
import pandas as pd
content = []
with open(filename, "r") as infile:
header = infile.readline().strip().split(",")
content = [i.strip().split(",", 2) for i in infile.readlines()]
df = pd.DataFrame(content, columns=header)
print(df)
Output:
book_id title content
0 1 book title 1 All Passion Spent is written in three parts, ...
1 2 Book Title 2 In particular Mr FitzGeorge, a forgotten acq...
Upvotes: 1