Justin
Justin

Reputation: 351

R Join without duplicates

Currently when joining two datasets (of different years) I get duplicates of the second one when there are less observations in the second one than the first.

Below, ID 1 only has 1 observation in year y, but it gets repeated because the first dataset of year x has three observations in total. I don't want the duplicates, but simply NAs.

So what I currently get is this:

ID Value.x   N.x Value.y   N.y
  <dbl> <chr>   <dbl> <chr>   <dbl>
1     1 A           6 A           2
2     1 B           7 A           2
3     1 C           1 A           2

What I want is:

ID Value.x   N.x Value.y   N.y
  <dbl> <chr>   <dbl> <chr>   <dbl>
1     1 A           6 A           2
2     1 B           7 NA           NA
3     1 C           1 NA           NA

The end result is that my manager can tell in year x customer 1 ordered A, B, C in n.x quantities. In year y they only ordered A in n.y quantities.

Data:

structure(list(ID = c(1, 1, 1), Value = c("A", "B", "C"), N = c(6, 
7, 1)), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
-3L))

structure(list(ID = 1, Value = "A", N = 2), class = c("tbl_df", 
"tbl", "data.frame"), row.names = c(NA, -1L))

Upvotes: 0

Views: 52

Answers (1)

an_ja
an_ja

Reputation: 427

I would do it like this:

merge(tbl_df1, tbl_df2, by = c("ID", "Value"), all = TRUE)

Upvotes: 2

Related Questions