Reputation: 3536
I'm trying to initialize a data.frame with 2 columns and 40 rows to which I'll go on adding rows. This is the code that I have -
result.frame = as.data.frame(matrix(ncol=2, nrow=10))
names(result.frame) = c("ID", "Value")
for (i in 1:10) {
value = somefunction(i)
rbind(result.frame, c(i, value))
}
When I run this, I'm just getting a data.frame containing NA. Also, I read on SO that dynamically growing structures is one of the least efficient ways to code in R. If this is true, what is the right way to accomplish something like this?
Thanks a lot!
Upvotes: 1
Views: 13475
Reputation: 43255
you aren't assigning your result frame to anything! The code below does what I think you were trying to show. However as you mention, it is inefficient.
result.frame = as.data.frame(matrix(ncol=2, nrow=10))
names(result.frame) = c("ID", "Value")
for (i in 1:10) {
value = 2 * i
result.frame = rbind(result.frame, c(i, value))
}
Instead make the data.frame the full size you want and assign into it:
result.frame = as.data.frame(matrix(ncol=2, nrow=20))
names(result.frame) = c("ID", "Value")
for (i in 11:20) {
value = 2 * i
result.frame[i,] = c(i, value)
}
breif timinigs:
> result.frame=data.frame()
> system.time(for(i in 1:10000){result.frame=rbind(result.frame, c(i,i*2))})
user system elapsed
9.844 0.000 9.874
> result.frame=as.data.frame(matrix(ncol=2, nrow=10000))
> system.time(for(i in 1:10000){result.frame[i,]=c(i,i*2)})
user system elapsed
7.041 0.056 7.120
>
Aside from time efficiencies, there are also important memory concerns as data gets larger. To perform the rbind
operation, the data must be copied which means you need twice the memory in contiguous blocks. Assigning to an already created data.frame
doesn't have this issue.
Upvotes: 6
Reputation: 17090
What happens is this: the NA's come from matrix
, since you have not initialized with any value. And the rbind
doesn't do anything because you have discarded the return value.
result.frame = data.frame( )
for( i in 1:10 ) {
value = somefunction( i )
result.frame = rbind( result.frame, c( i, value ) )
}
colnames( result.frame ) <- c( "ID", "Value" )
Don't worry about efficiency unless we are talking about millions of operations here. Normally the calculations are much more intensive than this little memory reallocation that R needs to do here.
Furthermore, your efficiency is also important, and it suffers when you need to calculate first how many exactly rows of a matrix you will need.
Upvotes: 3