nniloc
nniloc

Reputation: 4243

Map a function to two data frames of unequal lengths

For each row in df1 I would like to execute mult 10 times, once for each year in df2.

One option I can think of is to repeat df1 multiple times and join it to df2. But my actual data are much larger (~20k sections, 15 areas and 100 years), so I am looking for a more efficient way to do this.

# df1

  section area         a         b         c
1       1    1 0.1208916 0.7235306 0.7652636
2       2    1 0.8265642 0.2939602 0.6491496
3       1    2 0.9101611 0.7363248 0.1509295
4       2    2 0.8807047 0.5473221 0.6748055
5       1    3 0.2343558 0.2044689 0.9647333
6       2    3 0.4112479 0.9523639 0.1533197


----------


# df2

   year         d
1     1 0.7357432
2     2 0.4591575
3     3 0.3654561
4     4 0.1996439
5     5 0.2086226
6     6 0.5628826
7     7 0.4772953
8     8 0.8474007
9     9 0.8861693
10   10 0.6694851

mult <- function(a, b, c, d) {a * b * c * d}

The desired output would look something like this

   section area year                 e
1        1    1    1 results of mult()
2        2    1    1 results of mult()
3        1    2    1 results of mult()
4        2    2    1 results of mult()
5        1    3    1 results of mult()
6        2    3    1 results of mult()
7        1    1    2 results of mult()
8        2    1    2 results of mult()
...

dput(df1)

structure(list(section = c(1L, 2L, 1L, 2L, 1L, 2L), area = c(1L, 
1L, 2L, 2L, 3L, 3L), a = c(0.12089157756418, 0.826564211165532, 
0.91016107192263, 0.880704707000405, 0.234355789143592, 0.411247851792723
), b = c(0.72353063733317, 0.293960151728243, 0.736324765253812, 
0.547322086291388, 0.204468948533759, 0.952363904565573), c = c(0.765263637062162, 
0.649149592733011, 0.150929539464414, 0.674805536167696, 0.964733332861215, 
0.15331974090077)), out.attrs = list(dim = structure(2:3, .Names = c("section", 
"area")), dimnames = list(section = c("section=1", "section=2"
), area = c("area=1", "area=2", "area=3"))), class = "data.frame", row.names = c(NA, 
-6L))

dput(df2)

structure(list(year = 1:10, d = c(0.735743158031255, 0.459157506935298, 
0.365456136409193, 0.199643932981417, 0.208622586680576, 0.562882597092539, 
0.477295308141038, 0.847400720929727, 0.886169332079589, 0.669485098216683
)), class = "data.frame", row.names = c(NA, -10L))

Edit: full sized toy dataset

library(dplyr)

df1 <- expand.grid(section = 1:20000,
                   area = 1:15) %>%
  mutate(a = runif(300000),
         b = runif(300000),
         c = runif(300000))


df2 <- data.frame(year = 1:100,
                  d = runif(100))

Upvotes: 1

Views: 108

Answers (1)

Ronak Shah
Ronak Shah

Reputation: 389355

You can use crossing to create combinations of df1 and df2 and apply mult to them.

tidyr::crossing(df1, df2) %>% dplyr::mutate(e = mult(a, b, c, d))

Upvotes: 3

Related Questions