Reputation: 168
I have a table containing the following data:
df <- tibble(
dose = seq(10, 50, 10),
date = c("2007-12-15", "2007-10-13","2007-10-13","2007-09-30","2007-09-30"),
response = c(45, 67, 66, 54, 55),
name = c("Peter,Martin", "Gale,Rebecca", "Rebecca,Gale", "Jonathan,Smith", "Smith,Jonathan")
)
The table:
# A tibble: 5 x 4
dose date response name
<dbl> <chr> <dbl> <chr>
1 10 2007-12-15 45 Peter,Martin
2 20 2007-10-13 67 Gale,Rebecca
3 30 2007-10-13 66 Rebecca,Gale
4 40 2007-09-30 54 Jonathan,Smith
5 50 2007-09-30 55 Smith,Jonathan
One of the columns called name
either has a string "FirstName,LastName" or "LastName,FirstName". I wish to merge the rows that contain the same names if they are ordered either way. For example, the rows containing Rebecca,Gale and Gale,Rebecca should merge.
While merging, I wish to get the sums of the columns dose
and response
and want to keep the first of the date
and name
entries.
Expected outcome:
# A tibble: 3 x 4
dose date response name
<dbl> <chr> <dbl> <chr>
1 10 2007-12-15 45 Peter,Martin
2 50 2007-10-13 133 Gale,Rebecca
3 90 2007-09-30 109 Jonathan,Smith
Please note that I always want to merge using the name
column and not the date
column because even if the example contains the same dates, my bigger table has different dates for the same name.
Upvotes: 3
Views: 844
Reputation: 39154
Here is one idea.
library(tidyverse)
df2 <- df %>%
mutate(date = as.Date(date)) %>%
mutate(name = map_chr(name, ~toString(sort(str_split(.x, ",")[[1]])))) %>%
group_by(name) %>%
summarize(dose = sum(dose),
response = sum(response),
date = first(date)) %>%
select(names(df)) %>%
ungroup()
df2
# # A tibble: 3 x 4
# dose date response name
# <dbl> <date> <dbl> <chr>
# 1 50 2007-10-13 133 Gale, Rebecca
# 2 90 2007-09-30 109 Jonathan, Smith
# 3 10 2007-12-15 45 Martin, Peter
Upvotes: 4