Reputation: 35
I think I ran into a situation that (to the best of my knowledge) is not fully covered by the awesome dplyr
library, so I guess it will need a bit more of coding than what I am capable of. I have the following 2 data frames:
df1 =
Col1 Col2 Col3
1 A 1 X
2 A 1 X
3 B 1 X
4 C 1 X
5 D 1 Y
6 D 1 Z
df2 =
Col1 Col2 Col3
1 A 2 X
2 B 2 Y
3 C 2 Y
4 G 2 Z
5 H 2 X
6 I 2 Z
I want only the rows which has common elements from Col1
only, this is:
out =
Col1 Col2 Col3
1 A 1 X
2 A 1 X
3 B 1 X
4 C 1 X
5 A 2 X
6 B 2 Y
7 C 2 Y
It looks like dplyr::intersect
would do it, but since Col2
and Col3
have different values, it gives me table with 0 values. Your guidance is much appreciated. Thanks. P. Perez.
Upvotes: 2
Views: 189
Reputation: 83215
With base R you could do:
common <- intersect(df1$Col1, df2$Col1)
df3 <- rbind(df1, df2)
df3[df3$Col1 %in% common, ]
which gives:
Col1 Col2 Col3
1 A 1 X
2 A 1 X
3 B 1 X
4 C 1 X
11 A 2 X
21 B 2 Y
31 C 2 Y
And with dplyr
:
bind_rows(df1, df2) %>%
filter(Col1 %in% intersect(df1$Col1, df2$Col1))
which will give you the same output. An alternative by @Frank from the comments:
bind_rows(df1, df2, .id = "id") %>%
group_by(Col1) %>%
filter(n_distinct(id) == 2L)
The logic behind this is that you bind the two dataframes together and include an id-column simultaniously with the .id
-parameter. Then group by the values of Col1
and check how much unique id's there are for each value. The ones with only one unique id, don't appear in both dataframes.
A similar logic can be applied with the data.table
package:
library(data.table)
rbindlist(list(df1, df2), idcol = 'id')[, if (uniqueN(id) == 2L) .SD, by = Col1]
Upvotes: 1