rellis025
rellis025

Reputation: 1

identify and keep duplicates with r

Identify and keep only rows with duplicate elements in r

I have a large df with 20 plus columns and I need to identify and keep rows with duplicate elements from specified columns. My approach was going to be to create two new columns. The first column would be of concatenated elements. The second column would be a binary telling me if data in the first column is duplicated. My df looks like this:

enter image description here

For the first column I tried:

res1 <-mutate(Prac_df, Con_cat =apply(Prac_df[order(PIn, Age, Sex),], 1, function(x) paste0(x, collapse = "_")))

I don't think that worked and I'm not sure how to create the second column which I will need to run a logistic regression.

And after my two columns are added it would look like this: enter image description here

Upvotes: -1

Views: 725

Answers (1)

h1427096
h1427096

Reputation: 293

try this:

library(dplyr)

res1 <- Prac_df %>%  
  group_by(PIN, Age, Sex) %>% 
  mutate(isDuplicated = row_number() > 1) %>% 
  ungroup()

Upvotes: 2

Related Questions