Reputation: 17678
I am running multiple simulations with the same input parameters. Some simulations complete earlier than others and I need to extend the results of the shorter simulations so that I can analyse the data with all runs included. This means filling up 'short' runs with repeats of the final values until they are the same length as the 'long' runs with the same input parameters.
I would like a dplyr solution because the real datasets are massive and dplyr has fast joins.
Here is my attempt.
library(dplyr)
sims <- data.frame("run" = c(1, 1, 1, 2, 2, 3, 3),
"type" = c("A", "A", "A", "A", "A", "B", "B"),
"step" = c(0, 1, 2, 0, 1, 0, 1),
"value" = seq(1:7))
allSteps <- data.frame("type" = c("A", "A", "A", "B", "B"),
"step" = c(0, 1, 2, 0, 1))
merged <- full_join(sims, allSteps,
by = c("type", "step"))
This gets the output:
run type step value
1 A 0 1
1 A 1 2
1 A 2 3
2 A 0 4
2 A 1 5
3 B 0 6
3 B 1 7
But I actually want the following because run 2 is of type A and should therefore be expanded to the same length as run 1 (also type A):
run type step value
1 A 0 1
1 A 1 2
1 A 2 3
2 A 0 4
2 A 1 5
2 A 2 NA # extra line here
3 B 0 6
3 B 1 7
I will then use fill to get to my desired result of:
run type step value
1 A 0 1
1 A 1 2
1 A 2 3
2 A 0 4
2 A 1 5
2 A 2 5 # filled replacement of NA
3 B 0 6
3 B 1 7
I am sure this is a duplicate of some question but the various search terms I used didn't manage to surface it.
Upvotes: 3
Views: 76
Reputation: 47320
We can split the data frame by run and do a right_join
on allSteps
for each of them to have all the combinations you desire. Then we join back and fill.
It's a bit more general than current solutions in that you could have steps in allSteps
that may not be in sims
or in the sims
subset you're working on.
library(tidyverse)
sims %>%
split(.$run) %>%
map_dfr(right_join,allSteps,.id = "id") %>%
group_by(type,id) %>%
fill(run,value,.direction="down") %>%
ungroup %>%
filter(!is.na(run)) %>%
select(-id)
# # A tibble: 8 x 4
# run type step value
# <dbl> <fctr> <dbl> <int>
# 1 1 A 0 1
# 2 1 A 1 2
# 3 1 A 2 3
# 4 2 A 0 4
# 5 2 A 1 5
# 6 2 A 2 5
# 7 3 B 0 6
# 8 3 B 1 7
Upvotes: 0
Reputation: 56169
Using tidyr::complete to get missing combinations, then use fill to fill NAs with last non-NA value:
library(tidyr)
sims %>%
group_by(type) %>%
complete(run, step) %>%
select(run, type, step, value) %>%
ungroup() %>%
fill(value)
# # A tibble: 8 x 4
# run type step value
# <dbl> <fct> <dbl> <int>
# 1 1.00 A 0 1
# 2 1.00 A 1.00 2
# 3 1.00 A 2.00 3
# 4 2.00 A 0 4
# 5 2.00 A 1.00 5
# 6 2.00 A 2.00 5
# 7 3.00 B 0 6
# 8 3.00 B 1.00 7
Upvotes: 1
Reputation: 24188
We don't really need the data.frame allSteps
if at least one of the runs contains the full sequence for each type. Instead we can use tidyr::expand()
in combination with a self-join:
library(tidyr)
sims %>% group_by(type) %>%
expand(run, step) %>%
full_join(sims, by = c("type", "step", "run")) %>%
select(2,1,3,4)
# run type step value
# <dbl> <fctr> <dbl> <int>
#1 1 A 0 1
#2 1 A 1 2
#3 1 A 2 3
#4 2 A 0 4
#5 2 A 1 5
#6 2 A 2 NA
#7 3 B 0 6
#8 3 B 1 7
Upvotes: 2