Reputation: 11
I've been reading up on machine learning with python and sklearn. I've tried practicing with the iris dataset and then went on to find other datasets on the UCI website.
I found one that was called "Amazon Book Reviews".
The documentation says each entry is separated with a new line and each of the four attributes is separated with a blank-space " ".
Unfortunately the data contains spaces everywhere since you have a title(text) and a description(html).
When I try and use the panda csv_read function of course it doesn't know where to separate the columns and neither do I.
Any ideas? Am I just way too out of my depth for a machine learning (and programming in general) beginner?
Upvotes: 0
Views: 52
Reputation: 191728
each entry is separated with a new line and each of the four attributes is separated with a blank-space " "
read_csv
provides an optional sep
argument where the default is ','
You can make this a space.
Upvotes: 0
Reputation: 1237
You haven't done anything wrong, the documentation is actually incorrect. The delimiter used in the data files is actually a tab '\t'
character. You can use this as the delimiter
parameter to pandas.read_csv
.
Good luck with your analysis!
Upvotes: 2