Reputation: 1970
Wrangling data in R, I would like to mutate
a tibble in such a way that the numerical values in the new column are being looked up in a different tibble.
Given a dataset of catheter removals:
# A tibble: 51 x 2
ExplYear RemovalReason
<dbl> <chr>
1 2018 Infection
2 2018 Dysfunction
3 2018 Infection
# ... etc.
where each row corresponds to a single catheter removal, I would like to add a column Implants
that holds the total number of _im_plantations in the year that the catheter was removed (_ex_planted).
The implantation numbers are in a tibble impl_per_year
:
# A tibble: 13 x 2
ImplYear n
<dbl> <int>
1 2006 14
2 2007 46
3 2008 64
# ... etc.
I have tried to mutate
the first tibble with map
and a helper function:
lookup = function(year) { impl_per_year[impl_per_year$ImplYear == year,]$n }
explants %>% mutate(Implants = map(ExplYear, lookup)
But this places lots of empty integer vectors into the Implants
column:
# A tibble: 51 x 3
ExplYear RemovalReason Implants
<dbl> <chr> <list>
1 18 Infection <int [0]>
2 18 Dysfunction <int [0]>
3 18 Infection <int [0]>
# ... etc.
What is the mistake?
Upvotes: 1
Views: 1295
Reputation: 1486
library(tidyverse)
I altered your data so that my illustration wouldn't have a NULL
output.
df <- tribble(
~ExplYear, ~RemovalReason,
2018, "Infection",
2017, "Dysfunction",
2016, "Infection")
impl_per_year <- tribble(
~ImplYear, ~n,
2017, 14,
2016, 46,
2016, 64
)
left_join
is the function you're looking for. It's part of the dplyr::join
family of functions that do this.
It's good to have the same names for "joining" variables, but in your case you need the by = c( ... )
option to let left_join
know what you are joining by.
left_join(df, impl_per_year, by = c("ExplYear" = "ImplYear"))
# A tibble: 4 x 3
ExplYear RemovalReason n
<dbl> <chr> <dbl>
1 2018 Infection NA
2 2017 Dysfunction 14
3 2016 Infection 46
4 2016 Infection 64
Depending on what you want, consider right_join
, inner_join
, etc. until you get the output you are looking for. For example:
inner_join(df, impl_per_year, by = c("ExplYear" = "ImplYear"))
# A tibble: 3 x 3
ExplYear RemovalReason n
<dbl> <chr> <dbl>
1 2017 Dysfunction 14
2 2016 Infection 46
3 2016 Infection 64
... which gives only successful matches from both tibbles.
Upvotes: 0
Reputation: 56
You should be able to simply join the two tables by year. If we call your first tibble ExplTibble
and your second ImplTibble
, using dplyr
:
ExplTibble %>% left_join(ImplTibble, by = c("ExplYear" = "ImplYear"))
This should add a new column n
containing the number of implants in each year.
Upvotes: 4