Ana
Ana

Reputation: 1586

What can substitute a nested loop in R

I'd like to create the dataframe output from dataframe input in R by running multiple scenarios given two variables x and y. The column output is the sum of all the values in the column value where there xcol < x & ycol < y.

input =
xcol ycol value
1   5   4
2   6   9
3   7   8
4   9   7
5   14  8

And

output= 
x   y   results
2   5   0
2   10  4
2   15  35
...
6   5   0
6   10  27
6   15  35

My code is currently like this:

for (x in 2:6) {
  if (x%% 2){
    next
  }
  for (y in 5:15) {
    if (y %% 5){
      next
    }
    print(x)
    print(y)
    print(sum(input$value[!is.na(input$xcol) & !is.na(input$ycol) & !is.na(input$value) & 
              input$xcol < x &  input$ycol < y]))
  }
}

There should be a better way to replace this nested loop using lapply & sapply and create a dataframe I would think. I'd appreciate any helps.

Thanks

Upvotes: 0

Views: 1053

Answers (2)

r2evans
r2evans

Reputation: 160447

This seems more like a design-of-experiments, in a sense, where you are iterating over different possible values of x and y.

xs <- 2:6
ys <- 5:15
eg <- expand.grid(x = xs, y = ys)
head(eg)
#   x y
# 1 2 5
# 2 3 5
# 3 4 5
# 4 5 5
# 5 6 5
# 6 2 6

I think your %% filtering should be done outside/before this, so:

xs <- xs[!xs %% 2]
ys <- ys[!ys %% 5]
eg <- expand.grid(x = xs, y = ys)
head(eg)
#   x  y
# 1 2  5
# 2 4  5
# 3 6  5
# 4 2 10
# 5 4 10
# 6 6 10

From here, you can just iterate over the rows:

eg$out <- sapply(seq_len(nrow(eg)), function(r) {
  sum(input$value[ complete.cases(input) & input$xcol < eg$x[r] & input$ycol < eg$y[r] ])
})
eg
#   x  y out
# 1 2  5   0
# 2 4  5   0
# 3 6  5   0
# 4 2 10   4
# 5 4 10  21
# 6 6 10  28
# 7 2 15   4
# 8 4 15  21
# 9 6 15  36

I think your output variable is a little off, since "2,15" should only include input$value[1] (x < 2 is the limiting factor). (Other differences exist.)

Regardless of your actual indexing logic, I suggest this methodology over a double-for or double-lapply implementation.

NB:

  1. These commands are functionally equivalent with this input:

    complete.cases(input)                                         # 1
    complete.cases(input[c("xcol","ycol","value")])               # 2
    !is.na(input$xcol) & !is.na(input$xcol) & !is.na(input$value) # 3
    

    I used the first since "code golf", but if your actual input data.frame contains other columns, you may prefer the second to be more selective of which columns require non-NA values.

  2. expand.grid works great for this type of expansion. However, if you are looking at significantly larger datasets (including if your filtering is more complex than %% offers), than it can be a little expensive as it must create the entire data.frame in memory. Python's use of lazy iterators would be useful here, in which case you might prefer to use https://stackoverflow.com/a/36144255/3358272 (expanded function in a github gist with some docs: https://gist.github.com/r2evans/e5531cbab8cf421d14ed).

Upvotes: 1

Hallo
Hallo

Reputation: 142

This isn't the most elegant solution, but you can just replace the for loops with lapply:

lapply (2:6, function(x) {
  if (x%% 2){
    next
  }
  lapply (5:15, function(y) {
    if (y %% 5){
      next
    }
    print(x)
    print(y)
    print(sum(input$value[!is.na(input$xcol) & !is.na(input$ycol) & !is.na(input$value) & 
              input$xcol < x &  input$ycol < y]))
  })
})

Upvotes: 0

Related Questions