Reputation: 630
My csv version data is like:
name,words,name
John, "He says:"I love it!"", 18
At first, I tried to load data with
data <- read.table("data.csv",header = T,sep = ',',quote = "",stringsAsFactors = FALSE)
And error is:
Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, :
line 1 did not have 3 elements
Well, I can understand that, since R messes up with many doublequotes.
And I fixed it with
data <- read.table("data.csv",header = T,sep = ',',quote = "\"",,stringsAsFactors = FALSE) #change the name of the output file
However, I can't figure why is it so, how does R know which doublquotes he should stop at?
Upvotes: 1
Views: 223
Reputation: 37754
Well, that's an interesting data format -- and interesting behavior. The help page says "See scan for the behaviour on quotes embedded in quotes," but I didn't see anything useful in that help page, so I tried some things.
What I believe the quote
argument does is to tell R to ignore any sep
elements that occur between quotes, and also to remove any quote
elements (because that's meant to be used only for delimiting columns, not as data). So this works for you only because you don't have any commas after the second quote in your words
column.
Here are four examples.
name,words,name
John, "He says:"I love it!"", 18
Interestingly, this example works for me in both versions of your code. The first leaves in all the quotes and the second removes them.
read.table("data.csv", header = TRUE, sep = ',', quote = "", stringsAsFactors = FALSE)
## name words name.1
## 1 John "He says:"I love it!"" 18
read.table("data.csv", header = TRUE, sep = ',', quote = "\"", stringsAsFactors = FALSE)
## name words name.1
## 1 John He says:I love it! 18
name,words,name
John, "He says, "I love it!"", 18
Here the first version (quote=""
) separates the row into four columns, not three, based on the commas, and uses the extra column as the rownames. The second version ignores the added comma, but also removes the quotes around the actual quotation.
read.table("text.csv", header = TRUE, sep = ',', quote = "", stringsAsFactors = FALSE)
## name words name.1
## John "He says "I love it!"" 18
read.table("text.csv", header = TRUE, sep = ',', quote = "\"", stringsAsFactors = FALSE)
## name words name.1
## 1 John He says, I love it! 18
name,words,name
John, "He says: "I love it, do you?"", 18
Here both versions do almost the same thing (four columns) because the comma isn't between a paired quote. The first keeps the quotes, the second doesn't.
read.table("text.csv", header = TRUE, sep = ',', quote = "", stringsAsFactors = FALSE)
## name words name.1
## John "He says: "I love it do you?"" 18
read.table("text.csv", header = TRUE, sep = ',', quote = "\"", stringsAsFactors = FALSE)
## name words name.1
## John He says: I love it do you? 18
name,words,name
John, "He says, "I love it, do you?"", 18
Here the first one doesn't work, as it finds three column names but five columns in the first row. The second skips the first comma, but not the second, so again separates it into four columns, and uses the extra as the row name.
read.table("text.csv", header = TRUE, sep = ',', quote = "", stringsAsFactors = FALSE)
## Error in read.table("text.csv", header = TRUE, sep = ",", quote = "", :
## more columns than column names
read.table("text.csv", header = TRUE, sep = ',', quote = "\"", stringsAsFactors = FALSE)
## name words name.1
## John He says, I love it do you? 18
Finally, all of these examples only have one line; if you have more than one line and they parse into different numbers of columns, you'll get an error like the one you got, except for the first line at which the number of columns differ.
What surprises me about your error is that it happens on line 1; you'd get this error if R thought you had less than three columns in that line (the number it found in the header row), but on my system, anyway, it finds three elements in that line.
Upvotes: 1