DataBrown
DataBrown

Reputation: 11

defining proper separators with text in pandas csv_read

I've been reading up on machine learning with python and sklearn. I've tried practicing with the iris dataset and then went on to find other datasets on the UCI website.

I found one that was called "Amazon Book Reviews".

The documentation says each entry is separated with a new line and each of the four attributes is separated with a blank-space " ".

Unfortunately the data contains spaces everywhere since you have a title(text) and a description(html).

When I try and use the panda csv_read function of course it doesn't know where to separate the columns and neither do I.

Any ideas? Am I just way too out of my depth for a machine learning (and programming in general) beginner?

Upvotes: 0

Views: 52

Answers (2)

OneCricketeer
OneCricketeer

Reputation: 191728

each entry is separated with a new line and each of the four attributes is separated with a blank-space " "

read_csv provides an optional sep argument where the default is ','

You can make this a space.

Upvotes: 0

msitt
msitt

Reputation: 1237

You haven't done anything wrong, the documentation is actually incorrect. The delimiter used in the data files is actually a tab '\t' character. You can use this as the delimiter parameter to pandas.read_csv.

Good luck with your analysis!

Upvotes: 2

Related Questions