Reputation: 131
I want to remove both (all) sets of duplicate rows within a dataframe, where rows are duplicated by some but not all columns.
The below comes close to what I want, but it requires the entire row to be duplicated, not just certain columns (variables).
df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]
How would I modify the code to specify columns/variables, as can be done with the distinct function?
df.unique <- distinct(df, var1, var2, var3)
Upvotes: 0
Views: 321
Reputation: 887531
If we have a set of columns, select those columns to subset the data
nm1 <- paste0("var", 1:3)
df[!(duplicated(df[nm1]) | duplicated(df[nm1], fromLast = TRUE)), ]
Upvotes: 2