Reputation: 97
Good day,
I need a function that creates increasing ID's for two parameters. I came up with this function which works fine, but I want it to be vectorized and I cannot seem to avoid a Big O factor of N². Are there any 'better' ways to do this?
Standard function:
threshold <- 3
calculateID <- function(p, r) {
return((p-1) * threshold + r)
}
calculateID(1, 1) #returns 1
calculateID(1, 2) #returns 2
calculateID(1, 3) #returns 3
calculateID(2, 1) #returns 4
#.....
calculateID(5, 3) #returns 15
Vectorized function, I would like to give the two parameters as vectors so the function only has to be called once:
threshold <- 3
calculateIDVectorized <- function(p, r) {
return(unlist(
lapply(p, function(x) {
lapply(r, function(y) {
(x-1) * threshold + y
})
})
))
}
calculateIDVectorized(c(1, 2, 3, 4, 5), c(1, 2, 3)) # should return 1-15
To clarify: I want that every p and r argument is used so you should always get a result of length(p * r)
Upvotes: 4
Views: 85
Reputation: 1367
Since the OP was interested in fast computation, I compared the solutions:
library(microbenchmark)
p <- c(1:500) # using larger data set
r <- c(1:20)
threshhold = length(r) # parameterizing threshold
m = microbenchmark(
tidy= crossing(p, r) %>%
rowwise %>%
transmute(out = calculateID(p, r)) %>%
pull(out),
dcv = do.call(Vectorize(calculateID),unname(rev(expand.grid(r,p)))),
numbering = rev(expand.grid(r,p)) %>%
arrange(Var2, Var1) %>%
transmute(out = row_number()) %>%
pull(out),
hybrid = rev(expand.grid(r,p)) %>%
rowwise() %>%
transmute(out = calculateID(Var2, Var1)) %>%
pull(out),
outer = as.vector(t(outer(p, r, calculateID))),
outer_c = c(t(outer(p, r, calculateID))),
david = rep((p - 1), each = length(r)) * threshold + r
)
m
# Unit: microseconds
# expr min lq mean median uq max neval
# tidy 45441.869 47370.776 52123.6770 49482.1970 54158.4285 116780.840 100
# dcv 16259.935 17156.225 19641.6731 17897.8885 21576.0865 55489.586 100
# numbering 5947.147 6379.337 7127.5125 6576.3560 6952.3205 12005.854 100
# hybrid 44124.099 45856.210 51531.9480 47642.5405 52225.0600 175778.380 100
# outer 106.655 120.711 141.1137 128.9665 143.2465 265.072 100
# outer_c 117.811 137.446 152.5958 142.1315 155.9650 327.101 100
# david 223.125 230.711 257.5622 241.8675 260.6100 920.164 100
So it looks like the options using outer()
are fastest with as.vector()
edging out c()
. @DavidArenburg's solution is also right up with the solutions using outer()
.
I added a hybrid option using dplyr::transmute()
because rev(expand.grid())
was significantly faster thatn crossing()
, which appears to be marginally faster than the straight dplyr route, but still not as fast as the do.call(Vectorize... or the others.
another option (added above) would be to arrange the data frame and create id's using dplyr::row_number() or 1:nrow(). This option would work if all the combinations for p and r are present and unique, but would fail with non-sequential values.
Upvotes: 2
Reputation: 887213
An option with tidyverse
library(dplyr)
library(tidyr)
crossing(p, r) %>%
rowwise %>%
transmute(out = calculateID(p, r)) %>%
pull(out)
#[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Upvotes: 0
Reputation: 101663
Another base R option using do.call
+ Vectorize
+ expand.grid
> do.call(Vectorize(calculateID),unname(rev(expand.grid(r,p))))
[1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Data
p <- c(1, 2, 3, 4, 5)
r <- c(1, 2, 3)
Upvotes: 1
Reputation: 173868
You can use outer
:
calculateIDVectorized <- function(p, r) as.vector(t(outer(p, r, calculateID)))
calculateIDVectorized(c(1, 2, 3, 4, 5), c(1, 2, 3))
#> [1] 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Upvotes: 3