Reputation: 385
let's image that we have data frame of description of several individuals:
des <- c('mad', 'crazy','stupid', 'crazy','wise','dumb','mad','furious')
id <- c(1,2,3,4,5,6,7,8)
d <-data.frame(id,des)
d$dangerous <- NA
dan <-c('mad','crazy','furious')
We want to match d$des
with description in vector dan
I prepared the following function:
for (i in 1:nrow(d)){
for(j in 1:length(dan)){
if (d$des[i]==dan[j])
{d$dangerous[i] <- 1 }
} }
d
id des dangerous
1 1 mad 1
2 2 crazy 1
3 3 stupid NA
4 4 crazy 1
5 5 wise NA
6 6 dumb NA
7 7 mad 1
8 8 furious 1
The code works well however I wonder how to optimize the code if it could deal with longer vectors and data frame. Any ideas?
Upvotes: 1
Views: 44
Reputation: 76575
Here are timings of the several solutions and of a solution of mine.
I have timed the functions with the original data.frame d
and with a bigger data.frame, since the OP says it's an optimization problem.
OP <- function(DF, dan){
DF$dangerous <- NA
for (i in 1:nrow(DF)){
for(j in 1:length(dan)){
if (DF$des[i]==dan[j]) DF$dangerous[i] <- 1
}
}
DF
}
Carles <- function(DF, dan){
DF$dangerous<-ifelse(DF$des %in% dan, 1, NA)
DF
}
arg0naut91_1 <- function(DF, dan){
DF$dangerous <- NA
transform(DF, dangerous = replace(dangerous, des %in% dan, 1))
}
arg0naut91_2 <- function(DF, dan){
DF$dangerous <- NA
DF$dangerous[DF$des %in% dan] <- 1
DF
}
Rui <- function(DF, dan){
DF$dangerous <- c(1, NA)[(DF$des %in% dan) + 1]
DF
}
library(microbenchmark)
mb <- microbenchmark(
OP = OP(d, dan),
Carles = Carles(d, dan),
Rui = Rui(d, dan),
arg0naut91_1 = arg0naut91_1(d, dan),
arg0naut91_2 = arg0naut91_2(d, dan)
)
print(mb, order = "median")
#Unit: microseconds
# expr min lq mean median uq max neval cld
# Rui 22.623 25.1865 82.73746 27.2510 31.6630 5441.491 100 a
# Carles 31.740 34.4120 76.82339 36.9385 42.1760 3753.407 100 a
# arg0naut91_2 34.131 36.7140 89.10827 39.5925 46.6930 4577.938 100 a
# arg0naut91_1 226.237 230.1020 296.23198 234.6225 243.3040 4847.553 100 a
# OP 757.831 770.1875 926.88995 781.5630 818.2745 10992.040 100 b
e <- d
for(i in 1:10) e <- rbind(e, e)
mb2 <- microbenchmark(
OP = OP(e, dan),
Carles = Carles(e, dan),
Rui = Rui(e, dan),
arg0naut91_1 = arg0naut91_1(e, dan),
arg0naut91_2 = arg0naut91_2(e, dan),
times = 10
)
print(mb2, order = "median")
#Unit: microseconds
# expr min lq mean median uq max neval cld
# Rui 291.090 294.690 346.3638 298.9580 301.238 776.769 10 a
# arg0naut91_2 288.123 292.236 312.6684 311.2435 314.495 388.212 10 a
# Carles 427.500 430.120 447.7170 450.2570 453.884 480.424 10 a
# arg0naut91_1 513.059 517.822 611.0255 666.7095 670.059 688.023 10 a
# OP 898781.320 909717.469 911988.3906 914269.7245 916975.858 919223.886 10 b
Upvotes: 2
Reputation: 14764
Another option:
transform(d, dangerous = replace(dangerous, des %in% dan, 1))
id des dangerous
1 1 mad 1
2 2 crazy 1
3 3 stupid NA
4 4 crazy 1
5 5 wise NA
6 6 dumb NA
7 7 mad 1
8 8 furious 1
Or:
d$dangerous[d$des %in% dan] <- 1
Upvotes: 1
Reputation: 2829
Using ifelse()
with %in%
will do the trick:
d$dangerous<-ifelse(des %in% dan, 1,NA)
> d
id des dangerous
1 1 mad 1
2 2 crazy 1
3 3 stupid NA
4 4 crazy 1
5 5 wise NA
6 6 dumb NA
7 7 mad 1
8 8 furious 1
Upvotes: 2