Reputation: 133
I'm currently trying to import some data into weka. Currently the data is in a CSV file, and consists of a numerical ID and then some string data(Tweets). I'm getting an error where it is reading "Wrong number of values, Read 1, expected 2 Token[EOL], line 17". I'm using quotes as my enclosure characters for the String data. I understand that something(presumably an EOL character?) is causing weka to incorrectly separate some of the String data into multiple entries on the same line, but I'm not sure how to fix the EOL token problem.
My data set can be viewed here. The current data set is on Sheet 2:
https://docs.google.com/spreadsheets/d/1Yclu0t4ITFWn6itYBsVtkGalmP9BPaWFFP6U6jAeLMU/edit?usp=sharing
The text file itself may be found here:
https://drive.google.com/file/d/0B433FqC3TscQQkRxZklQclA3Z3M/view?usp=sharing
Current error is now on the 3rd line, with the same error. The only newline character there is the one at the end of the line denoting a new entry, so I'm not sure why its having issues.
Upvotes: 2
Views: 2724
Reputation: 2423
In its datasets, Weka considers a newline
character as an indication of the end of instance. Your line 17 is actually a multi-line tweet which confuses Weka. You can use either
newline
characters in every single tweet ornewline
character in them.Unfortunately, Weka does not have a mechanism to get rid of this problem by itself (as far as I know).
Okay, here are some other things that need to be fixed (according to your EDITS in the question):
'
with \'
grave accent
with \grave accent
"
) should be replaced by \"
id, "text"
\"
.These are just a few things that I noticed. There might be more. Time will tell.
Upvotes: 2