Anh
Anh

Reputation: 797

Aggregate rows with multiple conditions

I have a df as below

df <- data.frame(col1 = c("a", "a", "b",
                          "b", "c", "c"),
                 col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
                 col3 = c(1, NA, 2, NA, 3, NA),
                 col4 = c(NA, 1, NA, 2, NA, 3))
df
  col1 col2 col3 col4
1    a   x1    1   NA
2    a x1.1   NA    1
3    b   x2    2   NA
4    b x2.1   NA    2
5    c   x3    3   NA
6    c x3.1   NA    3

I want to merge rows that have the same letter in column col1 and filter rows in column col2 by telling them col2 %in% c(x1,x1.1) & col2 %in% c(x2,x2.1) & col3 %in% (x3,x3.1), simulatenously.

My desired output would be:

  col1 col2 col3 col4
1    a   x1    1   1
2    b   x2    2   2
3    c   x3    3   3

One solution from my side is to call if x == "x1", then col4 will be filled by values assosicated with x == "x1.1"

Any suggestions for this to group_by col1? Thank you in advance!

Additional note

I did draw a pic for visualization, that I think would be easier for you to imagine.

enter image description here

The values in the actual dataset are different from the example here.

Updated solution I found a solution with the help of akrun see here: Use ~separate after mutate and across

df |> 
  mutate(col2 = substring(col2, 1,2)) |> 
  mutate_if(is.numeric, ~replace(., is.na(.), "")) |> 
  group_by(col1, col2) |> 
  summarise(across(c(col3, col4), ~toString(.)), .groups = "drop") |> 
  mutate(col3 = str_remove(col3, ",")) |> 
  mutate(col4 = str_remove(col4, ", "))

I'm still open to further suggestions, if anyone has any.

Upvotes: 0

Views: 73

Answers (2)

TimTeaFan
TimTeaFan

Reputation: 18541

Not sure if I'm missing something, but for your example data the approach below works:

library(dplyr)

df %>% 
  mutate(col2 = sub("\\.[0-9]+$", "", col2),
         col3 = coalesce(col3, col4),
         col4 = coalesce(col4, col3)) %>% 
  distinct()
#>   col1 col2 col3 col4
#> 1    a   x1    1    1
#> 2    b   x2    2    2
#> 3    c   x3    3    3

Created on 2022-11-29 by the reprex package (v2.0.1)

Upvotes: 1

MarBlo
MarBlo

Reputation: 4524

After extracting only the first to characters of col2, you can use coalesce to bring col3 and col4 together. Then replacing NA by data from the coalesced column and using distinct at the end will lead to the desired output.

library(tidyverse)

df <- data.frame(col1 = c("a", "a", "b",
                          "b", "c", "c"),
                 col2 = c("x1", "x1.1", "x2", "x2.1", "x3", "x3.1"),
                 col3 = c(1, NA, 2, NA, 3, NA),
                 col4 = c(NA, 1, NA, 2, NA, 3))


df |> 
  mutate(col2 = substring(col2, 1,2)) |> 
  mutate(mycol = coalesce(col3, col4)) |>
  mutate(col3 = if_else(is.na(col3), mycol, col3),
         col4 = ifelse(is.na(col4), mycol, col4)) |> 
  select(-c(mycol)) |> 
  distinct()
#>   col1 col2 col3 col4
#> 1    a   x1    1    1
#> 2    b   x2    2    2
#> 3    c   x3    3    3

Upvotes: 0

Related Questions