user303287
user303287

Reputation: 131

Remove both duplicates based on multiple (but not all) columns

I want to remove both (all) sets of duplicate rows within a dataframe, where rows are duplicated by some but not all columns.

The below comes close to what I want, but it requires the entire row to be duplicated, not just certain columns (variables).

df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]

How would I modify the code to specify columns/variables, as can be done with the distinct function?

df.unique <- distinct(df, var1, var2, var3)

Upvotes: 0

Views: 321

Answers (1)

akrun
akrun

Reputation: 887531

If we have a set of columns, select those columns to subset the data

nm1 <- paste0("var", 1:3)
df[!(duplicated(df[nm1]) | duplicated(df[nm1], fromLast = TRUE)), ]

Upvotes: 2

Related Questions