Reputation: 1792
My data looks like this:
df1
#> Artist Album Year
#> 1 Beatles Sgt. Pepper's 1967
#> 2 Rolling Stones Sticky Fingers 1971
and
df2
#> Artist Members
#> 1 Beatles George
#> 2 Beatles Ringo
#> 3 Beatles Paul
#> 4 Beatles John
And I would like to join these two dfs, in what I believe is an "untidy" fashion. Despite the untidyness, it will be very helpful for me to have the final output look like the example below, where every band (Artist) only takes up one row, and the band members are all put into one column, separated by say commas:
Desired Output
#> Artist Album Members Year
#> 1 Beatles Sgt. Pepper's George, Ringo, Paul, John 1967
#> 2 Rolling Stones Sticky Fingers 1971
I've been able to get close to a solution (below), but:
library(tidyverse)
df1 <- data.frame(stringsAsFactors=FALSE,
Artist = c("Beatles", "Rolling Stones"),
Album = c("Sgt. Pepper's", "Sticky Fingers"),
Year = c(1967, 1971)
)
df2 <- data.frame(stringsAsFactors=FALSE,
Artist = c("Beatles", "Beatles", "Beatles", "Beatles"),
Members = c("George", "Ringo", "Paul", "John")
)
df <- left_join(df1, df2, by = "Artist")
df <- df %>% group_by(Artist) %>% mutate(member_number = seq_along(Members))
df <- spread(df, key = "member_number", value = "Members", sep = "_")
df <- df %>% unite(col = "members", member_number_1:member_number_4, sep = ",")
Which gives the output
df
#> # A tibble: 2 x 4
#> # Groups: Artist [2]
#> Artist Album Year members
#> <chr> <chr> <dbl> <chr>
#> 1 Beatles Sgt. Pepper's 1967 George,Ringo,Paul,John
#> 2 Rolling Stones Sticky Fingers 1971 NA,NA,NA,NA
Upvotes: 1
Views: 61
Reputation: 47340
My package safejoin allows aggregation operations on the joined table by the joining variable :
# devtools::install_github("moodymudskipper/safejoin")
library(safejoin)
library(dplyr)
df1 %>% eat(df2, .agg = toString)
# Joining, by = "Artist"
# Artist Album Year Members
# 1 Beatles Sgt. Pepper's 1967 George, Ringo, Paul, John
# 2 Rolling Stones Sticky Fingers 1971 <NA>
Upvotes: 0
Reputation: 887841
Using data.table
library(data.table)
setDT(df2)[df1, on = .(Artist)][, .(members = toString(Members)),
.(Artist, Album, Year)]
# Artist Album Year members
#1: Beatles Sgt. Pepper's 1967 George, Ringo, Paul, John
#2: Rolling Stones Sticky Fingers 1971 NA
Upvotes: 2
Reputation: 6532
slightly differently:
library(dplyr)
left_join(df1, df2) %>%
group_by(Artist, Album, Year) %>%
summarise(members = paste(Members, collapse = ","))
# A tibble: 2 x 4
# Groups: Artist, Album [?]
Artist Album Year members
<chr> <chr> <dbl> <chr>
1 Beatles Sgt. Pepper's 1967 George,Ringo,Paul,John
2 Rolling Stones Sticky Fingers 1971 NA
Upvotes: 3
Reputation: 389235
We can left_join
and then summarise
multiple columns and collapse them into unique
comma-separated strings.
library(dplyr)
left_join(df1, df2, by = "Artist") %>%
group_by(Artist) %>%
summarise_at(vars(Album:Members), ~toString(unique(.)))
# A tibble: 2 x 4
# Artist Album Year Members
# <chr> <chr> <chr> <chr>
#1 Beatles Sgt. Pepper's 1967 George, Ringo, Paul, John
#2 Rolling Stones Sticky Fingers 1971 NA
Upvotes: 2