Reputation: 967
I have a df
id a1 a2
1 x1 y1
2 x2 y2
and another dataframe df2
id name1 name2
1 a1 b1
1 a2 b2
2 a3 b3
3 a4 b4
3 a5 b5
df2 could contain multiple records of unique id's from df1.
I need to join the dataframes in such a way that for each row of df1, i should have one column from first record of df2 and if it exists, second column from second record.
To explain, the output should be like :
id a1 a2 n1 n2
1 x1 y1 a1 a2
2 x2 y2 a3 NA
For doing this I have split df2 on id using split
s <- split(df2, df2$id)
but i'm unsure how to use sapply over that. Any pointers for this
Upvotes: 0
Views: 42
Reputation: 887118
If we are not taking the 'name2' column
library(dplyr)
df2 %>%
filter(id %in% df$id) %>%
select(-name2) %>%
group_by(id) %>%
mutate(rn = paste0("n", row_number())) %>%
spread(rn, name1) %>%
left_join(df, .)
# id a1 a2 n1 n2
#1 1 x1 y1 a1 a2
#2 2 x2 y2 a3 <NA>
df <- structure(list(id = 1:2, a1 = c("x1", "x2"), a2 = c("y1", "y2"
)), .Names = c("id", "a1", "a2"), class = "data.frame", row.names = c(NA,
-2L))
df2 <- structure(list(id = c(1L, 1L, 2L, 3L, 3L), name1 = c("a1", "a2",
"a3", "a4", "a5"), name2 = c("b1", "b2", "b3", "b4", "b5")), .Names = c("id",
"name1", "name2"), class = "data.frame", row.names = c(NA, -5L))
Upvotes: 1