Reputation: 797
I have a df
as below
df <- data.frame(col1 = c("a", "a", "b",
"b", "c", "c"),
col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
col3 = c(1, NA, 2, NA, 3, NA),
col4 = c(NA, 1, NA, 2, NA, 3))
df
col1 col2 col3 col4
1 a x1 1 NA
2 a x1.1 NA 1
3 b x2 2 NA
4 b x2.1 NA 2
5 c x3 3 NA
6 c x3.1 NA 3
I want to merge rows that have the same letter in column col1
and filter rows in column col2
by telling them col2 %in% c(x1,x1.1) & col2 %in% c(x2,x2.1) & col3 %in% (x3,x3.1)
, simulatenously.
My desired output would be:
col1 col2 col3 col4
1 a x1 1 1
2 b x2 2 2
3 c x3 3 3
One solution from my side is to call if x == "x1"
, then col4
will be filled by values assosicated with x == "x1.1"
Any suggestions for this to group_by
col1
? Thank you in advance!
Additional note
I did draw a pic for visualization, that I think would be easier for you to imagine.
The values in the actual dataset are different from the example here.
Updated solution I found a solution with the help of akrun see here: Use ~separate after mutate and across
df |>
mutate(col2 = substring(col2, 1,2)) |>
mutate_if(is.numeric, ~replace(., is.na(.), "")) |>
group_by(col1, col2) |>
summarise(across(c(col3, col4), ~toString(.)), .groups = "drop") |>
mutate(col3 = str_remove(col3, ",")) |>
mutate(col4 = str_remove(col4, ", "))
I'm still open to further suggestions, if anyone has any.
Upvotes: 0
Views: 73
Reputation: 18541
Not sure if I'm missing something, but for your example data the approach below works:
library(dplyr)
df %>%
mutate(col2 = sub("\\.[0-9]+$", "", col2),
col3 = coalesce(col3, col4),
col4 = coalesce(col4, col3)) %>%
distinct()
#> col1 col2 col3 col4
#> 1 a x1 1 1
#> 2 b x2 2 2
#> 3 c x3 3 3
Created on 2022-11-29 by the reprex package (v2.0.1)
Upvotes: 1
Reputation: 4524
After extracting only the first to characters of col2
, you can use coalesce
to bring col3
and col4
together.
Then replacing NA
by data from the coalesced column and using distinct
at the end will lead to the desired output.
library(tidyverse)
df <- data.frame(col1 = c("a", "a", "b",
"b", "c", "c"),
col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
col3 = c(1, NA, 2, NA, 3, NA),
col4 = c(NA, 1, NA, 2, NA, 3))
df |>
mutate(col2 = substring(col2, 1,2)) |>
mutate(mycol = coalesce(col3, col4)) |>
mutate(col3 = if_else(is.na(col3), mycol, col3),
col4 = ifelse(is.na(col4), mycol, col4)) |>
select(-c(mycol)) |>
distinct()
#> col1 col2 col3 col4
#> 1 a x1 1 1
#> 2 b x2 2 2
#> 3 c x3 3 3
Upvotes: 0