Reputation: 1493
I am attempting to use readLines
to import a 17.6GB csv file into R. I have tried several approaches discussed here, here, here, and elsewhere and readLines
seems to be the only approach that effectively at least can get the data into R.
The problem is that I am unable to convert the output from readLines
into a data frame which I can use in my analysis. The answers to a related question here are not helping me solve my problem.
Here is my sample data:
write.csv(data.frame(myid=1:10,var=runif(10)),"temp.csv")
dt<-data.frame(myid=1:10,var=runif(10))
dt
myid var
1 1 0.5949020
2 2 0.8515591
3 3 0.8139010
4 4 0.3804234
5 5 0.4923082
6 6 0.9933775
7 7 0.1740895
8 8 0.8342808
9 9 0.3958154
10 10 0.9690561
creating chunks:
file_i <- file("temp.csv","r")
chunk_size <- 100000 # choose the best size for you
x<- readLines(file_in, n=chunk_size)
Opening the output from readLines in R:
View(x)
x
[1] "\"\",\"myid\",\"var\""
[2] "\"1\",1,0.594902001088485"
[3] "\"2\",2,0.851559089729562"
[4] "\"3\",3,0.81390100880526"
[5] "\"4\",4,0.380423351423815"
[6] "\"5\",5,0.492308202432469"
[7] "\"6\",6,0.993377464590594"
[8] "\"7\",7,0.174089450156316"
[9] "\"8\",8,0.834280799608678"
[10] "\"9\",9,0.395815373631194"
[11] "\"10\",10,0.969056134112179"
Thanks in advance for any help
Upvotes: 1
Views: 2307
Reputation:
Given the output you get after readLines, this must be the content of your CSV file:
"","myid","var"
"1","1","0.5949020"
"2","2","0.8515591"
"3","3","0.8139010"
"4","4","0.3804234"
"5","5","0.4923082"
"6","6","0.9933775"
"7","7","0.1740895"
"8","8","0.8342808"
"9","9","0.3958154"
"10","10","0.9690561"
That is, your values are comma separated and enclosed in double quotes. When I read in this file, I get your output:
dat
[1] "\"\",\"myid\",\"var\"" "\"1\",\"1\",\"0.5949020\""
[3] "\"2\",\"2\",\"0.8515591\"" "\"3\",\"3\",\"0.8139010\""
[5] "\"4\",\"4\",\"0.3804234\"" "\"5\",\"5\",\"0.4923082\""
[7] "\"6\",\"6\",\"0.9933775\"" "\"7\",\"7\",\"0.1740895\""
[9] "\"8\",\"8\",\"0.8342808\"" "\"9\",\"9\",\"0.3958154\""
[11] "\"10\",\"10\",\"0.9690561\""
So what you need to do is
unlist(strsplit(..., split = ",")
and
gsub("\"", "", ...)
which gives us:
unlist(strsplit(gsub("\"", "", dat), split = ","))
[1] "" "myid" "var" "1" "1" "0.5949020" "2"
[8] "2" "0.8515591" "3" "3" "0.8139010" "4" "4"
[15] "0.3804234" "5" "5" "0.4923082" "6" "6" "0.9933775"
[22] "7" "7" "0.1740895" "8" "8" "0.8342808" "9"
[29] "9" "0.3958154" "10" "10" "0.9690561"
Upvotes: 0
Reputation: 76575
Here is a complete sequence of instructions to transform the data as you posted into a dataframe.
set.seed(1234) # Make the results reproducible
write.csv(data.frame(myid=1:10,var=runif(10)),"temp.csv")
dat <- readLines("temp.csv")
df1 <- strsplit(dat[-1], ",")
df1 <- do.call(rbind, df1)
df1 <- df1[,-1]
df1 <- as.data.frame(df1)
df1[] <- lapply(df1, function(x) as.numeric(as.character(x)))
names(df1) <- gsub('"', '', strsplit(dat[1], ',')[[1]][-1], fixed = TRUE)
df1
Upvotes: 3