Reputation: 1
Identify and keep only rows with duplicate elements in r
I have a large df with 20 plus columns and I need to identify and keep rows with duplicate elements from specified columns. My approach was going to be to create two new columns. The first column would be of concatenated elements. The second column would be a binary telling me if data in the first column is duplicated. My df looks like this:
For the first column I tried:
res1 <-mutate(Prac_df, Con_cat =apply(Prac_df[order(PIn, Age, Sex),], 1, function(x) paste0(x, collapse = "_")))
I don't think that worked and I'm not sure how to create the second column which I will need to run a logistic regression.
And after my two columns are added it would look like this:
Upvotes: -1
Views: 725
Reputation: 293
try this:
library(dplyr)
res1 <- Prac_df %>%
group_by(PIN, Age, Sex) %>%
mutate(isDuplicated = row_number() > 1) %>%
ungroup()
Upvotes: 2