Reputation: 103
I have a very big text file that I cannot read in R with read.table()
because of its huge size.
I know that with readLines()
function you may specify how many rows you want to import, but I would need to import one line at the time in a for loop, and save it in a new file or store in a vector/list/whatever...
So, something that in python
would be:
myfile=open("myfile.txt",mode="r")
for line in myfile:
line=line.strip()
line=line.split("\t")
print line
Is that possible with R?
Upvotes: 1
Views: 1317
Reputation: 1046
While Яaffael's answer is enough, this a typical use case for package iterators.
With this package you iterate over the file, line by line, without really load all the data to memory. Just to show an example i will crack the Airlines data with this method. Get 1988 and follow this code:
> install.packages('iterators')
> library(iterators)
> con <- bzfile('1988.csv.bz2', 'r')
OK, now you have a connection to your file. Let's create a iterator:
> it <- ireadLines(con, n=1) ## read just one line from the connection (n=1)
Just to test:
> nextElem(it)
and you will see something like:
1 "1988,1,9,6,1348,1331,1458,1435,PI,942,NA,70,64,NA,23,17,SYR,BWI,273,NA,NA,0,NA,0,NA,NA,NA,NA,NA"
> nextElem(it)
and you will see the next line, and so on.
If you want to read line by line till the end of the file you can use
> tryCatch(expr=nextElem(it), error=function(e) return(FALSE))
for example. When the file ends it return a logical FALSE.
Upvotes: 0
Reputation: 20045
Give scan()
a try. Using skip
you can skip already read lines and using nlines
you can specify the number of lines you would like to read. Then you can loop through the file.
> large <- 10000
> m <- matrix(sample(c(0,1),3*7,replace=TRUE), ncol=3)
> write.table(m, "test.txt")
> for(i in 0:large) {
+ l <- scan("test.txt", what = character(), skip = i, nlines = 1)
+ if(length(l) == 0) break
+ print (l)
+ }
Read 3 items
[1] "V1" "V2" "V3"
Read 4 items
[1] "1" "0" "1" "0"
Read 4 items
[1] "2" "0" "0" "0"
Read 4 items
[1] "3" "0" "0" "0"
Read 4 items
[1] "4" "0" "1" "1"
Read 4 items
[1] "5" "1" "1" "1"
Read 4 items
[1] "6" "1" "0" "1"
Read 4 items
[1] "7" "0" "0" "1"
Read 0 items
The code serves the purpose of illustrating how to apply scan()
and how to know when you have to stop reading.
Upvotes: 1