Prego
Prego

Reputation: 25

Collapsing together duplicate rows with synonyms

If my data looks something like this:

species1     species2    info1    info2
Loro         Parrot      3        NA 
NA           Parrot      NA       7
Osprey       NA          NA       89
Sparrow      Finch       NA       19
Sparrow      NA          27       NA
Mallard      Duck        69       16
Mallard      NA          NA       NA
NA           Swift       25       NA

And i want to merge it together like this:

species1     species2    info1    info2
Loro         Parrot      3        7
Osprey       NA          NA       89
Sparrow      Finch       27       19
Mallard      Duck        69       16
NA           Swift       25       NA

How could i do it, tanking into account that i need to keep the NA records?

Thank you very much! :)

Upvotes: 1

Views: 43

Answers (2)

Ronak Shah
Ronak Shah

Reputation: 388797

You may group by one column and fill the NA values in other one to get the pairs, after which take sum of the values grouping by both the species column.

library(dplyr)
library(tidyr)

df %>%
  group_by(species2) %>%
  fill(species1) %>%
  group_by(species1) %>%
  fill(species2) %>%
  group_by(species2, .add = TRUE) %>%
  summarise(across(.fns = sum, na.rm = TRUE)) %>%
  ungroup

#  species1 species2 info1 info2
#  <chr>    <chr>    <int> <int>
#1 Loro     Parrot       3     7
#2 Mallard  Duck        69    16
#3 Osprey   NA           0    89
#4 Sparrow  Finch       27    19
#5 NA       Swift       25     0

Upvotes: 2

akrun
akrun

Reputation: 886938

We could use similar approach from the previous post, but in a different way i.e. first create a named vector from the 'species' columns'. Use that to replace the values in first 'species1' column, coalese with the second one to do a grouping and then do the summarise

library(dplyr)
library(tibble)
nm1 <- df1 %>% 
      select(species1, species2) %>% 
      na.omit %>% 
      deframe
df1 %>% 
     group_by(species = coalesce(nm1[species1], species2)) %>% 
     summarise(across(everything(), ~ .[complete.cases(.)][1])) %>% 
     select(-species)
# A tibble: 5 x 4
  species1 species2 info1 info2
  <chr>    <chr>    <int> <int>
1 Mallard  Duck        69    16
2 Sparrow  Finch       27    19
3 Loro     Parrot       3     7
4 <NA>     Swift       25    NA
5 Osprey   <NA>        NA    89

data

df1 <- structure(list(species1 = c("Loro", NA, "Osprey", "Sparrow", 
"Sparrow", "Mallard", "Mallard", NA), species2 = c("Parrot", 
"Parrot", NA, "Finch", NA, "Duck", NA, "Swift"), info1 = c(3L, 
NA, NA, NA, 27L, 69L, NA, 25L), info2 = c(NA, 7L, 89L, 19L, NA, 
16L, NA, NA)), class = "data.frame", row.names = c(NA, -8L))

Upvotes: 3

Related Questions