Taylor Young
Taylor Young

Reputation: 3

How to identify and mark duplicate data in a specific column

I have a data set that contains some duplicate IDs and Dates in different columns. I am trying to identify and mark the duplicate IDs, so that I can then use an ifelse statement to further break down this data set. Sample of data set If you look at the data, you will see that in the Case: Case Number column, there is a duplicated ID. In the Questionnaire: Created Date column, the dates are different. I basically want to be able to identify the duplicate items within the Case number column and tell if the dates are different or the same in the Date column. It would be awesome if I could even figure out how to do like an ifelse statement that helps me mark the duplicate numbers. I am just not sure how to proceed. The end goal is to remove duplicates that have the same date. Any ideas?

Upvotes: 0

Views: 855

Answers (1)

Jose Victor Zambrana
Jose Victor Zambrana

Reputation: 519

Here's an example

library(lubridate)
library(dplyr)

x = data.frame(ID = c(1,1,2,3), date = as_date(c(1,1,2,4))) %>%
  group_by(ID,date) %>%
  mutate(duplicated = n() > 1)

Output

  ID date       duplicated
  <dbl> <date>     <lgl>     
1     1 1970-01-02 TRUE      
2     1 1970-01-02 TRUE      
3     2 1970-01-03 FALSE     
4     3 1970-01-05 FALSE    

Upvotes: 1

Related Questions