Reputation: 25
If my data looks something like this:
species1 species2 info1 info2
Loro Parrot 3 NA
NA Parrot NA 7
Osprey NA NA 89
Sparrow Finch NA 19
Sparrow NA 27 NA
Mallard Duck 69 16
Mallard NA NA NA
NA Swift 25 NA
And i want to merge it together like this:
species1 species2 info1 info2
Loro Parrot 3 7
Osprey NA NA 89
Sparrow Finch 27 19
Mallard Duck 69 16
NA Swift 25 NA
How could i do it, tanking into account that i need to keep the NA records?
Thank you very much! :)
Upvotes: 1
Views: 43
Reputation: 388797
You may group by one column and fill
the NA
values in other one to get the pairs, after which take sum
of the values grouping by both the species column.
library(dplyr)
library(tidyr)
df %>%
group_by(species2) %>%
fill(species1) %>%
group_by(species1) %>%
fill(species2) %>%
group_by(species2, .add = TRUE) %>%
summarise(across(.fns = sum, na.rm = TRUE)) %>%
ungroup
# species1 species2 info1 info2
# <chr> <chr> <int> <int>
#1 Loro Parrot 3 7
#2 Mallard Duck 69 16
#3 Osprey NA 0 89
#4 Sparrow Finch 27 19
#5 NA Swift 25 0
Upvotes: 2
Reputation: 886938
We could use similar approach from the previous post, but in a different way i.e. first create a named vector from the 'species' columns'. Use that to replace the values in first 'species1' column, coalese
with the second one to do a grouping and then do the summarise
library(dplyr)
library(tibble)
nm1 <- df1 %>%
select(species1, species2) %>%
na.omit %>%
deframe
df1 %>%
group_by(species = coalesce(nm1[species1], species2)) %>%
summarise(across(everything(), ~ .[complete.cases(.)][1])) %>%
select(-species)
# A tibble: 5 x 4
species1 species2 info1 info2
<chr> <chr> <int> <int>
1 Mallard Duck 69 16
2 Sparrow Finch 27 19
3 Loro Parrot 3 7
4 <NA> Swift 25 NA
5 Osprey <NA> NA 89
df1 <- structure(list(species1 = c("Loro", NA, "Osprey", "Sparrow",
"Sparrow", "Mallard", "Mallard", NA), species2 = c("Parrot",
"Parrot", NA, "Finch", NA, "Duck", NA, "Swift"), info1 = c(3L,
NA, NA, NA, 27L, 69L, NA, 25L), info2 = c(NA, 7L, 89L, 19L, NA,
16L, NA, NA)), class = "data.frame", row.names = c(NA, -8L))
Upvotes: 3