Reputation: 509
If I have this data:
df1 <- data.frame(name = c("apple", "apple", "apple", "orange", "orange"),
ID = c(1, 2, 3, 4, 5),
is_fruit = c("yes", "yes", "yes", "yes", "yes"))
and I want to keep only the unique rows, but ignore the ID
column such that the output looks like this:
df2 <- data.frame(name = c("apple", "orange"),
ID = c(1, 4),
is_fruit = c("yes", "yes"))
df2
# name ID is_fruit
#1 apple 1 yes
#2 orange 4 yes
How can I do this, ideally with dplyr
?
Upvotes: 4
Views: 2734
Reputation: 32538
Base R
df1[!duplicated(df1[!names(df1) %in% c("ID")]),]
# name ID is_fruit
#1 apple 1 yes
#4 orange 4 yes
Replace c("ID")
with the names of the columns you want to ignore
Upvotes: 4
Reputation: 214927
You can use distinct
function; By specifying the variables explicitly, you can retain unique rows just based on these columns; And also from ?distinct
:
If there are multiple rows for a given combination of inputs, only the first row will be preserved
distinct(df1, name, is_fruit, .keep_all = T)
# name ID is_fruit
#1 apple 1 yes
#2 orange 4 yes
Upvotes: 8