Reputation: 85
I had about 200 different files (all of them were big matrices, 465x1080) (that is huge for me). I then used cbind2
to make them all one bigger matrix (465x200000).
I did that because I needed to create one separate file for each row (465 files) and I thought that it would be easier for R to load the data from 1 file to the memory only ONCE and then just read line per line creating a separate file for each one of them, instead of opening and closing 200 different files for every row.
Is this really the faster way? (I am wondering because now it is taking quite a lot to do that). When I check in the Task Manager from Windows it shows the RAM used by R and it just goes from 700MB to 1GB to 700MB all the time (twice every second). Seems like the main file wasn't loaded just once, but that it is being loaded and erased from the memory in every iteration (which could be the reason why it is a bit slow?).
I am a beginner so all of this that I wrote might not make any sense.
Here is my code: (those +1 and -1 are because the original data has 1 extra column that I dont need in the new files)
extractStationData <- function(OriginalData, OutputName = "BCN-St") {
for (i in 1:nrow(OriginalData)) {
OutputData <- matrix(NA,nrow = ncol(OriginalData)-1,3)
colnames(OutputData) <- c("Time","Bikes","Slots")
for (j in 1:(ncol(OriginalData)-1)) {
OutputData[j,1] <- colnames(OriginalData[j+1])
OutputData[j,2] <- OriginalData[i,j+1]
}
write.table(OutputData,file = paste(OutputName,i,".txt",sep = ""))
print(i)
}
}
Any thoughts? Maybe I should just create an object (the huge file) before the first for loop and then it would be loaded just once?
Thanks in advance.
Upvotes: 0
Views: 110
Reputation: 3650
Lets assume you have already created the 465x200000 matrix and in question are only extractStationData
function. Then we can modify it for example like this:
require(data.table)
extractStationData <- function(d, OutputName = "BCN-St") {
d2 <- d[, -1] # remove the column you do not need
# create empty matrix outside loop:
emtyMat <- matrix(NA, nrow = ncol(d2), 3)
colnames(emtyMat) <- c("Time","Bikes","Slots")
emtyMat[, 1] <- colnames(d2)
for (i in 1:nrow(d2)) {
OutputData <- emtyMat
OutputData[, 2] <- d2[i, ]
fwrite(OutputData, file = paste(OutputName, i, ".txt", sep = "")) # use fwrite for speed
}
}
V2:
If your OriginalData
is in matrix format this approach for creating the list of new data.tables looks quite fast:
extractStationData2 <- function(d, OutputName = "BCN-St") {
d2 <- d[, -1] # romove the column you dont need
ds <- split(d2, 1:nrow(d2))
r <- lapply(ds, function(x) {
k <- data.table(colnames(d2), x, NA)
setnames(k, c("Time","Bikes","Slots"))
k
})
r
}
dl <- extractStationData2(d) # list of new data objects
# write to files:
for (i in seq_along(dl)) {
fwrite(dl[[i]], file = paste(OutputName, i, ".txt", sep = ""))
}
Should work also for data.frame
with minor changes:
k <- data.table(colnames(d2), t(x), NA)
Upvotes: 1