Niccola Tartaglia
Niccola Tartaglia

Reputation: 1667

Loop to dynamically fill dataframe R

I am running a for loop to fill dynamically a dataframe (I know a baby seal dies somewhere because I am using a for loop)

I have something like this in mind (the 5 is a placeholder for a function that returns a scalar):

results<-data.frame(matrix(NA, nrow = length(seq(1:10)), ncol = 
length(seq(1:10))))
rows<-data.frame(matrix(NA, nrow = 1, ncol = 1))
for (j in seq(1:10)){
rows<-data.frame()
for (i in seq(1:10)){
   rows<-cbind(rows,5)
}
results<-cbind(results,rows)
}

I get the following error message with my approach above.

Error in match.names(clabs, names(xi)) : 
names do not match previous names

Is there an easier way?

Upvotes: 3

Views: 7744

Answers (2)

MKR
MKR

Reputation: 20095

Not sure what is your intention. Now keeping your intention and way of implementation a way to fix the problem to change for-loop so that rows is initialized with 1st value. The second for-loop should be from seq(2:10).

The error is occurring because attempting to cbind a blank data.frame with valid value.

for (j in seq(1:10)){
  rows<-data.frame(5)    #Initialization with 1st value
  for (i in seq(2:10)){  #Loop 2nd on wards. 
    rows<-cbind(rows,5)
  }
  results<-cbind(results,rows)
}

Upvotes: 1

Scott Ritchie
Scott Ritchie

Reputation: 10543

Dynamically filling an object using a for loop is fine - what causes problems is when you dynamically build an object using a for loop (e.g. using cbind and rbind rows).

When you build something dynamically, R has to go and request new memory for the object in each loop, because it keeps increasing in size. This causes a for loop to slow down with every iteration as the object gets bigger.

When you create the object beforehand (e.g. a data.frame with the right number of rows and columns), and fill it in by index, the for loop doesn't have this problem.

One final thing to keep in mind is that for data.frames (and matrices) each column is stored as a vector in memory – so its usually more efficient to fill these in one column at a time.

With all that in mind we can revise your code as follows:

results <- data.frame(matrix(NA, nrow = length(seq(1:10)), 
                                 ncol = length(seq(1:10))))
for (rowIdx in 1:nrow(results)) {
  for (colIdx in 1:ncol(results)) {
    results[rowIdx, colIdx] <- 5 # or whatever value you want here
  }
}

Upvotes: 13

Related Questions