Reputation: 23
I have a large dataframe and I need to calculate values for combinations of columns based on an id column. I would like to do so within a tidyverse
framework and I can get there, but it does not seem to be very elegant and error-prone. Perhaps someone could help me out.
Here is a minimal working example similar to the actual data.
library(tidyverse)
df <- tibble(
id_combo = c("A_A1", "A_A1", "A_A2", "A_A2", "A_A2"),
f1 = runif(5),
f2 = runif(5),
f3 = runif(5),
b1 = runif(5),
b2 = runif(5),
b3 = runif(5)
)
f1_f2 <- df %>%
split(.$id_combo) %>%
map_dbl(., ~var(log(.$f1))+var(log(.$f2)))
f1_f3 <- df %>%
split(.$id_combo) %>%
map_dbl(., ~var(log(.$f1))+var(log(.$f2))+var(log(.$f3)))
f1_b2 <- df %>%
split(.$id_combo) %>%
map_dbl(., ~var(log(.$f1))+var(log(.$f2))+
var(log(.$b1))+var(log(.$b2)))
f1_b3 <- df %>%
split(.$id_combo) %>%
map_dbl(., ~var(log(.$f1))+var(log(.$f2))+var(log(.$f3))+
var(log(.$b1))+var(log(.$b2))+var(log(.$b3)))
var_sum_df <- tibble(id_combo = names(f1_f2),f1_f2, f1_f3, f1_b2, f1_b3)
What I hope to achieve is to run the map_dbl
function (or a sensible equivalent) after split(.$id_combo)
, specifying the columns on the fly.
I am sure this would be possible, but my R knowledge is not yet advanced enough to figure out myself.
Upvotes: 0
Views: 786
Reputation: 11981
I am not sure if I understood the question correctly, but is this what you are looking for?
library(tidyverse)
df %>% group_by(id_combo) %>%
summarise_all(~var(log(.x))) %>%
mutate(f1_f2 = f1 + f2,
f1_f3 = f1_f2 + f3,
f1_b2 = f1_f3 + b1 + b2,
f1_b3 = f1_b2 + b3) %>%
select(id_combo, contains("_"))
# A tibble: 2 x 5
id_combo f1_f2 f1_f3 f1_b2 f1_b3
<chr> <dbl> <dbl> <dbl> <dbl>
1 A_A1 0.0582 0.0701 1.24 6.89
2 A_A2 2.43 2.57 3.50 3.76
Upvotes: 1