Reputation: 356
In R, the csv files I import are missing a comma at the end of the header row. I download new files every day so I would like to figure out how to fix the problem in R rather then edit the files each time. Here is an example.
head1,head2,head3"dat1","dat2","123","dat1b","dat2b","456"
The files appear to have quotes around all data, not just strings. The import method I am using is:
mydata <-read.csv('mycsv.csv', stringsAsFactors=FALSE)
Possibly I can find the first quote and insert a comma before it.
Thanks in advance
Upvotes: 1
Views: 798
Reputation: 27388
This is probably not the most elegant solution, but it may suffice.
First, read in whole lines rather than attempting to interpret as csv straight away. Where I've used textConnection
in this first code block, you can supply a file path or url, e.g. readLines('/path/to/my/strange.csv')
.
tmp <- readLines(textConnection('head1,head2,head3"dat1","dat2","123"
"dat3","dat4","456"
"dat5","dat6","789"
"dat7","dat8","012"
"dat9","dat10","345"
"dat11","dat12","678"'))
Then a bit of manipulation of the first row:
h <- sub('\".*', '', tmp[1]) # extracts the headers from the first line
row1 <- sub('[^\"]*(.*)', '\\1', tmp[1]) # extracts the first row's data
tmp <- c(row1, tmp[-1]) # combines the first row's data with subsequent rows' data
Now interpret is as a csv:
dat <- read.csv(textConnection(tmp), header=FALSE) # read tmp in as a csv
names(dat) <- strsplit(h, ',')[[1]] # add headers
dat
head1 head2 head3
1 dat1 dat2 123
2 dat3 dat4 456
3 dat5 dat6 789
4 dat7 dat8 12
5 dat9 dat10 345
6 dat11 dat12 678
Upvotes: 7