KT_1
KT_1

Reputation: 8494

Identify duplicates in a df in a particular column in R

For a sample dataframe:

df <- structure(list(code = c("a1", "a1", "b2", "v4", "f5", "f5", "h7", 
                        "a1"), name = c("katie", "katie", "sally", "tom", "amy", "amy", 
                                        "ash", "james"), number = c(3.5, 3.5, 2, 6, 4, 4, 7, 3)), .Names = c("code", 
                                                                                                             "name", "number"), class = c("tbl_df", "tbl", "data.frame"), row.names = c(NA, 
                                                                                                                                                                                        -8L), spec = structure(list(cols = structure(list(code = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                             "collector")), name = structure(list(), class = c("collector_character", 
                                                                                                                                                                                                                                                                                                                               "collector")), number = structure(list(), class = c("collector_double", 
                                                                                                                                                                                                                                                                                                                                                                                   "collector"))), .Names = c("code", "name", "number")), default = structure(list(), class = c("collector_guess", 
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                "collector"))), .Names = c("cols", "default"), class = "col_spec"))

I want to produce a dataframe of rows that have duplicates in one specific column only.

I know I can do:

df[duplicated(df),]

But for my specific larger real dataframe, I want to only specify a particular column that I want to highlight duplicates in.

Any ideas?

Upvotes: 0

Views: 37

Answers (1)

s_baldur
s_baldur

Reputation: 33603

duplicated() accepts vectors...

df[duplicated(df$name), ]
  code  name number
2   a1 katie    3.5
6   f5   amy    4.0

Upvotes: 2

Related Questions