Reputation: 509

How to keep only unique rows but ignore a column?

If I have this data:

df1 <- data.frame(name = c("apple", "apple", "apple", "orange", "orange"),
       ID = c(1, 2, 3, 4, 5),
       is_fruit = c("yes", "yes", "yes", "yes", "yes"))

and I want to keep only the unique rows, but ignore the ID column such that the output looks like this:

df2 <- data.frame(name = c("apple", "orange"),
       ID = c(1, 4),
       is_fruit = c("yes", "yes"))

df2
#    name ID is_fruit
#1  apple  1      yes
#2 orange  4      yes

How can I do this, ideally with dplyr?

Upvotes: 4

Answers (3)

ikk89

Reputation: 35

You can use dplyr's distinct with across

df1 |> distinct(across(ID))

Upvotes: 0

d.b

Reputation: 32558

Base R

df1[!duplicated(df1[!names(df1) %in% c("ID")]),]
#    name ID is_fruit
#1  apple  1      yes
#4 orange  4      yes

Replace c("ID") with the names of the columns you want to ignore

Upvotes: 4

akuiper

Reputation: 215117

You can use distinct function; By specifying the variables explicitly, you can retain unique rows just based on these columns; And also from ?distinct:

If there are multiple rows for a given combination of inputs, only the first row will be preserved

distinct(df1, name, is_fruit, .keep_all = T)
#    name ID is_fruit
#1  apple  1      yes
#2 orange  4      yes

Upvotes: 8

How to keep only unique rows but ignore a column?

Answers (3)

Related Questions