John
John

Reputation: 356

R importing csv with comma missing on header

In R, the csv files I import are missing a comma at the end of the header row. I download new files every day so I would like to figure out how to fix the problem in R rather then edit the files each time. Here is an example.

head1,head2,head3"dat1","dat2","123","dat1b","dat2b","456"

The files appear to have quotes around all data, not just strings. The import method I am using is:

mydata <-read.csv('mycsv.csv', stringsAsFactors=FALSE)

Possibly I can find the first quote and insert a comma before it.

Thanks in advance

Upvotes: 1

Views: 798

Answers (2)

user3117837
user3117837

Reputation: 97

You can try

data=scan(file.choose(),"")

Upvotes: 0

jbaums
jbaums

Reputation: 27388

This is probably not the most elegant solution, but it may suffice.

First, read in whole lines rather than attempting to interpret as csv straight away. Where I've used textConnection in this first code block, you can supply a file path or url, e.g. readLines('/path/to/my/strange.csv').

tmp <- readLines(textConnection('head1,head2,head3"dat1","dat2","123"
"dat3","dat4","456"
"dat5","dat6","789"
"dat7","dat8","012"
"dat9","dat10","345"
"dat11","dat12","678"'))

Then a bit of manipulation of the first row:

h <- sub('\".*', '', tmp[1]) # extracts the headers from the first line
row1 <- sub('[^\"]*(.*)', '\\1', tmp[1]) # extracts the first row's data
tmp <- c(row1, tmp[-1]) # combines the first row's data with subsequent rows' data

Now interpret is as a csv:

dat <- read.csv(textConnection(tmp), header=FALSE) # read tmp in as a csv
names(dat) <- strsplit(h, ',')[[1]] # add headers

dat

  head1 head2 head3
1  dat1  dat2   123
2  dat3  dat4   456
3  dat5  dat6   789
4  dat7  dat8    12
5  dat9 dat10   345
6 dat11 dat12   678

Upvotes: 7

Related Questions