Reputation: 28169
I'm trying to convert a data frame from R to a text file.
The data set is ~ 1500 x 700 and it takes a while to loop thru the dataframe and I'm wondering if there's any way to speed up the process.
My data frame is like this:
>train2
score x1 x2 x3 x4 x5 ... x700
0 0 1 1 1 0 0
1 0 1 0 0 0 0
0 1 0 1 1 1 0
3 0 1 1 1 0 0
1 0 1 0 1 0 0
2 1 1 1 1 0 1
0 0 1 1 0 0 0
... . . . . . .
In the created file I only include cells that are non-zero.
So the output for row 1-3 would be:
0 | x2:1 x3:1 x4:1
1 | x2:1
0 | x1:1 x3:1 x4:1
My current code runs like this:
pt1 <- paste(train2$score," | ",sep="")
collect1 <- c()
for(j in 1:nrow(train2)){
word1 <- pt1[j]
for(i in 10:ncol(train2)){
if(train2[j,i] !=0){
word1 <- paste(word1,colnames(train2)[i],":",train2[j,i], " ", sep="")
}
}
collect1 <- c(collect1, word1)
if(j %% 100 == 0){
print(j);flush.console()
gc()
}
}
Each run takes ~ 3-4 minutes. Is there anything obvious to improve the performance?
EDIT: after the loops are completed, the resulting data frame collect1
is used to create a text file using:
write(collect1, file="outPut1.txt")
Upvotes: 1
Views: 240
Reputation: 417
Try vectoring the operation as follows (I put 'score' in a separate variable and removed it from 'train3' so I wouldn't have to subset the data frame in the anonymous function):
score <- train2$score
train3 <- train2[, -1]
cols <- colnames(train3)
res <- apply(train3, 1, function(x) {
idx <- x != 0
nms <- cols[idx]
vals <- x[idx]
paste(nms, vals, sep=":", collapse=" ")
})
out <- paste(score, "|", as.vector(res))
print(out)
Upvotes: 4