Reputation: 67
I have a dataset where I want to select only one row for each individual each year - however, I would like to mutate a column so that if it says 'yes' for any of that persons rows then all the rows say 'yes'.
This is an example of the dataset I have:
So where the name, clinic and year are the same, I want the tested column to say 'yes' if any of the other rows for that grouping say 'yes'.
Therefore, this is what I would want the dataset to finally look like:
Upvotes: 0
Views: 64
Reputation: 7075
I have included an extra patient who visited two different clinics to demonstrate what happens in those cases as well. This dplyr
solution works by checking if any()
values in each group_by()
are "yes", and then modifies the tested values if true.
library(dplyr)
df <- data.frame(name = paste0("patient ", rep(c("x", "xy", "xyz", "abc"), each = 2)),
clinic = c(rep(c("xxy", "ggf", "ffr"), each = 2), "hhx", "hhy"),
year = c(rep(c(2022, 2019), each = 2), 2018, 2019, 2017, 2017),
date = month.name[c(4,5,1,2,3,5,6,7)],
tested = c("yes", "no", "no", "yes", "yes", "no", "yes", "no"))
result <- df %>%
group_by(name, clinic, year) %>%
mutate(tested = case_when(any(tested == "yes") ~ "yes", TRUE ~ tested)) %>%
ungroup()
result
# A tibble: 8 × 5
name clinic year date tested
<chr> <chr> <dbl> <chr> <chr>
1 patient x xxy 2022 April yes
2 patient x xxy 2022 May yes
3 patient xy ggf 2019 January yes
4 patient xy ggf 2019 February yes
5 patient xyz ffr 2018 March yes
6 patient xyz ffr 2019 May no
7 patient abc hhx 2017 June yes
8 patient abc hhy 2017 July no
Upvotes: 1
Reputation: 783
This is quite straightforward using dplyr
. Here is an option:
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
df <- tribble(
~ name, ~ clinic, ~ year, ~ date, ~ tested,
"a", "xxy", 2022, "April", "yes",
"a", "xxy", 2022, "May", "no",
"b", "ggf", 2019, "Jan", "no",
"b", "ggf", 2019, "Feb", "yes",
"c", "ffr", 2018, "March", "yes",
"c", "ffr", 2019, "May", "no"
)
df |>
mutate(tested2 = if_else(any(tested == "yes"), "yes", "no"), .by = c(name, year))
#> # A tibble: 6 × 6
#> name clinic year date tested tested2
#> <chr> <chr> <dbl> <chr> <chr> <chr>
#> 1 a xxy 2022 April yes yes
#> 2 a xxy 2022 May no yes
#> 3 b ggf 2019 Jan no yes
#> 4 b ggf 2019 Feb yes yes
#> 5 c ffr 2018 March yes yes
#> 6 c ffr 2019 May no no
Created on 2024-02-25 with reprex v2.1.0
I would recommend to read this question before posting future questions. It makes easier to help you.
Upvotes: 1