Reputation: 671
I am trying to load a comma-delimited data file that also has commas in one of its text columns. The following sample code generates such a file'test.csv',
which I'll load usingread.csv()
to illustrate my problem.
> d <- data.frame(name = c("John Smith", "Smith, John"), age = c(34, 34))
> d
name age
1 John Smith 34
2 Smith, John 34
> write.csv(d, file = "test.csv", quote = F, row.names = F)
> d2 <- read.csv("test.csv")
> d2
name age
John Smith 34 NA
Smith John 34
Because of the ','
in Smith, John
, d2
is not assigned correctly. How do I read the file so that d2
looks exactly like d
?
Thanks.
Upvotes: 4
Views: 1661
Reputation: 270348
1) read.pattern read.pattern
(in gsubfn package) can read such files:
library(gsubfn)
pat <- "(.*),(.*)"
read.pattern("test.csv", pattern = pat, header = TRUE, as.is = TRUE)
giving:
name age
1 John Smith 34
2 Smith, John 34
2) two pass Another possibility is to read it in, fix it up and then re-read it. This uses no packages and gives the same output.
L <- readLines("test.csv")
read.table(text = sub("(.*),", "\\1|", L), header = TRUE, sep = "|", as.is = TRUE)
Note: For 3 fields with the third field at the end use this in (1)
pat <- "(.*),([^,]+),([^,]+)"
The same situation use this in (2) assuming that there are non-spaces adjacent to each of the last two commas and at least one space adjacent to any commas in the text field and that fields have at least 2 characters:
text = gsub("(\\S),(\\S)", "\\1|\\2", L)
If you have some other arrangement just modify the regular expression in (1) appropriately and the sub
or gsub
in (2).
Upvotes: 5