Find duplicated character values in two columns with dplyr

Question

I have a data set with columns containing first names and last names. I want to filter those rows where the first name and the last name is identical.

For example, if the first name says Peter and the last name Parker several times in the data I want to filter those rows.

For now, I tried:

library(dplyr)
dat %>%
  filter(duplicated(as.numeric(`First name`)) & duplicated(as.numeric(`Last name`)))

However, the returned values in the column first name and last name are not the same.

@arg0naut

dat %>%
  filter(duplicated(paste0(`First name`, `Last name`)))

    # A tibble: 5 x 2
      `First name` `Last name`
                    
    1 Frank        Seehaus    
    2 Nadine       Urseanu    
    3 Rudolf       Schicker   
    4 Renate       Kaymer     
    5 Brigitte     Reibenspies

I want to see:

    # A tibble: 5 x 2
      `First name` `Last name`
                    
    1 Peter        Parker    
    2 Perer       Perker    
    3 Peter       Parker   
    ...

arg0naut91 · Accepted Answer

You could try:

library(dplyr)

dat %>%
  filter(duplicated(paste0(`First name`, `Last name`)))

Output on the basis of data below:

  First name Last name
1      Peter    Parker

If you'd like to have all the duplications returned, you could do:

dat %>%
  group_by(`First name`, `Last name`) %>%
  filter(n() > 1)

Output on the basis of data below:

# A tibble: 2 x 2
# Groups:   First name, Last name [1]
  `First name` `Last name`
                
1 Peter        Parker     
2 Peter        Parker

Example data:

dat <-
  data.frame(
    `First name` = c("Peter", "Peter", "John", "John"),
    `Last name` = c("Parker", "Parker", "Biscuit", "Chocolate"),
    check.names = FALSE
  )

dat

  First name Last name
1      Peter    Parker
2      Peter    Parker
3       John   Biscuit
4       John Chocolate

Find duplicated character values in two columns with dplyr

Answers (1)

Related Questions