Reputation: 2282
I would like to identify the length of measurement was taken during a specific day based on id
and id1
. The measurement length is recorded in the d1...d5
variables and d_day
defines the length of the measurement I am looking for. For example in the first case, I am after d3
measurement length that equals 30
. I tried the which
functions but sure there are faster solutions.
Desired output
Data structure:
Sample data:
structure(list(id = c(101, 101), id_1 = c(1, 2), d1 = c(15, 0
), d2 = c(30, 15), d3 = c(25, 30), d4 = c(15, 30), d5 = c(15,
60), d_day = c("d3", "d1")), class = c("spec_tbl_df", "tbl_df",
"tbl", "data.frame"), row.names = c(NA, -2L), spec = structure(list(
cols = list(id = structure(list(), class = c("collector_double",
"collector")), id_1 = structure(list(), class = c("collector_double",
"collector")), d1 = structure(list(), class = c("collector_double",
"collector")), d2 = structure(list(), class = c("collector_double",
"collector")), d3 = structure(list(), class = c("collector_double",
"collector")), d4 = structure(list(), class = c("collector_double",
"collector")), d5 = structure(list(), class = c("collector_double",
"collector")), d_day = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
Upvotes: 0
Views: 53
Reputation: 39657
As your d_day
contains already the column number it could directly used for subsetting which should be faster than using match
.
cbind(x[1:2], d_day_h = x[3:7][cbind(seq_len(nrow(x)),
as.numeric(substring(x$d_day, 2)))])
# id id_1 d_day_h
#1 101 1 25
#2 101 2 0
Benchmark:
library(dplyr)
library(tidyr)
bench::mark(
Gki = {cbind(x[1:2], d_day_h = as.data.frame(x[3:7])[cbind(seq_len(nrow(x)), as.numeric(substring(x$d_day, 2)))])}
, KarthikS = {as.data.frame(x %>% pivot_longer(-c(id, id_1, d_day)) %>% filter(d_day == name) %>%
select(id, id_1, 'd_day_h' = value))}
, GuedesBF = {as.data.frame(x %>%
rowwise %>%
mutate(d_day_h = get(d_day)) %>%
select(-c(d1:d_day))%>%
ungroup)}
)
# expression min median `itr/sec` mem_alloc `gc/sec` n_itr n_gc
# <bch:expr> <bch:tm> <bch:tm> <dbl> <bch:byt> <dbl> <int> <dbl>
#1 Gki 288.74µs 306.78µs 3198. 0B 12.5 1538 6
#2 KarthikS 9.19ms 9.57ms 101. 25.1KB 11.2 45 5
#3 GuedesBF 6ms 6.52ms 151. 13.5KB 10.9 69 5
Currently method of GKi is about 20 times faster and uses less memory than the method form GuedesBF and about 30 times faster than the method from KarthikS.
Upvotes: 1
Reputation: 9858
A very simple approach with get()
:
library(dplyr)
df %>%
rowwise %>%
mutate(d_day_h = get(d_day)) %>%
select(-c(d1:d_day))%>%
ungroup
# A tibble: 2 x 3
id id_1 d_day_h
<dbl> <dbl> <dbl>
1 101 1 25
2 101 2 0
Upvotes: 1
Reputation: 11584
Does this work:
library(dplyr)
library(tidyr)
df %>% pivot_longer(-c(id, id_1, d_day)) %>% filter(d_day == name) %>%
select(id, id_1, 'd_day_h' = value)
# A tibble: 2 x 3
id id_1 d_day_h
<dbl> <dbl> <dbl>
1 101 1 25
2 101 2 0
Upvotes: 1