Reputation: 9
I'm writing a script that has to build a large matrix. I want to take a vector of names for each name get data from a different data frame do some operations on it, and then return a vector of data for that name. for example:
allNew=matrix(ncol=ncol(X)-1);
for(name in list)
{
tmpdata=all[grep(names,list$Names),];
data=(as.data.frame(apply(tmpdata[,2:(ncol(tmpdata)-1)],2,sum))==nrow(tmpdata))*1
colnames(data)=name;
data=t(data);
allNew=rbind(allNew,data);
}
the length of the names list is in the 10000 range, and for each name tmpdata has 1-5 rows. I'm running my code on my labs linux server with about 8 GB ram,
somehow I feel this is taking a lot longer than it should, it takes a few minutes. How can I do this more efficiently?
Upvotes: 0
Views: 718
Reputation: 145805
As the comments pointed out, growing an object one line at a time is much slower than overwriting parts of a pre-allocated object. Something like this should work--though without any test data it's hard to be sure.
allNew=matrix(NA, ncol=ncol(X)-1, nrow = length(list));
for(i in 1:length(list))
{
name <- names(list)[i]
tmpdata=all[grep(names,list$Names), ]
data=(as.data.frame(apply(tmpdata[, 2:(ncol(tmpdata)-1)], 2, sum))==nrow(tmpdata))*1
colnames(data)=name
allNew[i, ] = t(data)
}
Upvotes: 1