Reputation: 25
I am having problems reading the csv-file below (extract) using R:
id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"
df <- read_csv("don.csv", quote = "")
gives me qoutes in cells, which I can process afterwards, but can it not be done more smoothly during importing?
Upvotes: 0
Views: 63
Reputation: 270258
1) If there are no quotes in the input other than ones we don't want then this would work. If the input is coming from a file replace textConnection(Lines)
with "don.csv"
.
L <- readLines(textConnection(Lines))
read.csv(text = gsub('"', '', L))
giving:
id created_date stars charity_id user_id is_anonymous user_country_id
1 1 2016-08-10 12:50:30 100 65536 32772 NULL 110
2 65 2016-11-09 07:57:32 50 425986 2686978 1 110
3 66 2016-11-09 08:07:51 50 393217 753673 0 110
2) Also assuming that double quotes are all unwanted, another possibility is:
read.csv(pipe("sed 's/\042//g' don.csv"))
On Windows you will need to have Rtools installed and on your path for this to work or, if not on your path give the full path, e.g. "C:\\Rtools\\bin\\sed"
.
The input, Lines
is:
Lines <-
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
Upvotes: 3
Reputation: 12569
You can use:
d <- read.table(sep='"', skip=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",")
# or d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")
(For your file you have to use file="don.csv"
instead of my text=...
.)
The result is
# d
# V1 V2 V3 V4 V5 V6 V7
# 1 NA 1, NA 2016-08-10 12:50:30 NA ,100,65536,32772,NULL,110 NA
# 2 NA 65, NA 2016-11-09 07:57:32 NA ,50,425986,2686978,1,110 NA
# 3 NA 66, NA 2016-11-09 08:07:51 NA ,50,393217,753673,0,110 NA
# d2
# V1 V2 V3 V4 V5 V6 V7
# 1 1 NA 100 65536 32772 NULL 110
# 2 65 NA 50 425986 2686978 1 110
# 3 66 NA 50 393217 753673 0 110
Eventually you want to rename the columns and bind the columns together with cbind()
The names of the columns you can get with:
cnames <- read.table(sep=',', nrows=1, text=
'id,created_date,stars,charity_id,user_id,is_anonymous,user_country_id
"1,""2016-08-10 12:50:30"",100,65536,32772,NULL,110"
"65,""2016-11-09 07:57:32"",50,425986,2686978,1,110"
"66,""2016-11-09 08:07:51"",50,393217,753673,0,110"'
)
as.character(unlist(cnames[1,]))
(For your file you have to use file="don.csv"
instead of my text=...
.)
cnames <- read.table(sep=',', nrows=1, file="don.csv")
H <- as.character(unlist(cnames[1,]))
d <- read.table(sep='"', skip=1, file="don.csv")
d2 <- read.table(text=paste0(d$V2, d$V6), sep=",", na.strings = "NULL")
d.d2 <- cbind(d[, 4], d2[, -2])
names(d.d2) <- H[c(2, 1, 3:7)]
d.d2
Upvotes: 0