Manojit
Manojit

Reputation: 671

reading comma-separated strings with read.csv()

I am trying to load a comma-delimited data file that also has commas in one of its text columns. The following sample code generates such a file'test.csv',which I'll load usingread.csv()to illustrate my problem.

> d <- data.frame(name = c("John Smith", "Smith, John"), age = c(34, 34))
> d
         name age
1  John Smith  34
2 Smith, John  34
> write.csv(d, file = "test.csv", quote = F, row.names = F)
> d2 <- read.csv("test.csv")
> d2
            name age
John Smith    34  NA
Smith       John  34

Because of the ',' in Smith, John, d2 is not assigned correctly. How do I read the file so that d2 looks exactly like d?

Thanks.

Upvotes: 4

Views: 1661

Answers (1)

G. Grothendieck
G. Grothendieck

Reputation: 270348

1) read.pattern read.pattern (in gsubfn package) can read such files:

library(gsubfn)

pat <- "(.*),(.*)"
read.pattern("test.csv", pattern = pat, header = TRUE, as.is = TRUE)

giving:

         name age
1  John Smith  34
2 Smith, John  34

2) two pass Another possibility is to read it in, fix it up and then re-read it. This uses no packages and gives the same output.

L <- readLines("test.csv")
read.table(text = sub("(.*),", "\\1|", L), header = TRUE, sep = "|", as.is = TRUE)

Note: For 3 fields with the third field at the end use this in (1)

pat <- "(.*),([^,]+),([^,]+)"

The same situation use this in (2) assuming that there are non-spaces adjacent to each of the last two commas and at least one space adjacent to any commas in the text field and that fields have at least 2 characters:

text = gsub("(\\S),(\\S)", "\\1|\\2", L)

If you have some other arrangement just modify the regular expression in (1) appropriately and the sub or gsub in (2).

Upvotes: 5

Related Questions