dplyr::distinct - keep only selected columns, not all

Question

Applying dplyr::distinct, in order to keep only selected columns instead of all (.keep_all = TRUE), I am currently selecting post hoc using select:

library(dplyr)

foo_df <- data.frame(id1=c(1,1,3),id2=c(1,1,4), val1 = letters[1:3], val2 = letters[3:5])

foo_df %>% distinct(id1,id2,.keep_all = TRUE) %>% select(id1,id2, val1)

# I want to keep "val1" and the identifiers for unique combinations

#>   id1 id2 val1
#> 1   1   1    a
#> 2   3   4    c

#> packageVersion('dplyr')
#> [1] ‘0.7.7’

^{Created on 2018-12-19 by the reprex package (v0.2.1)}

But is there a more succinct way? Happy to be pointed to another function too.

Shame on me if this is a dupe.

phiver · Accepted Answer

Maybe the data.table syntax is more to your liking. It is more succinct than dplyr.

library(data.table)

DT <- data.table(foo_df)

# ?data.table::unique
unique(DT[, .(id1, id2, val1)], by = c("id1", "id2"))

   id1 id2 val1
1:   1   1    a
2:   3   4    c

dplyr::distinct - keep only selected columns, not all

Answers (1)

Related Questions