Reputation: 119

Finding duplicates by columns in r

I want to find duplicates horizontally and keeping the uniques. Please help me with this. I am sharing a sample dataset. Hope this helps.

X <- c(1,2,3,4,5)
Y <- c("India","India","Philippines","Netherlands","France")
Z <- c("India","India","Netherlands","France","France")
S <- c("India","France","Netherlands","France","India")
TableTest <- data.frame(X,Y,Z,S)
TableTest

Input dataset

 X           Y           Z           S
1 1       India       India       India
2 2       India       India      France
3 3 Philippines Netherlands Netherlands
4 4 Netherlands      France      France
5 5      France      France       India

Expected Output

 X           Y           Z  S
1 1       India          NA NA
2 2       India      France NA
3 3 Philippines Netherlands NA
4 4 Netherlands      France NA
5 5      France       India NA

Please help.

Upvotes: 0

Answers (4)

ThomasIsCoding

Reputation: 102625

Another base R option

TableTest[-1] <- do.call(rbind,lapply(apply(TableTest[-1],1,unique),`length<-`,ncol(TableTest)-1))

or a simpler version (thanks for advice by @Onyambu in the comments)

TableTest[-1] <- t(apply(TableTest[-1], 1, function(x)`length<-`(unique(x),ncol(TableTest[-1]))))

which gives

> TableTest
  X           Y           Z    S
1 1       India        <NA> <NA>
2 2       India      France <NA>
3 3 Philippines Netherlands <NA>
4 4 Netherlands      France <NA>
5 5      France       India <NA>

Upvotes: 2

slava-kohut

Reputation: 4233

My solution:

  TableTest[2:4] <- as.data.frame(t(apply(TableTest[2:4], 1, function(x) {
  xo <- ifelse(!duplicated(x), x, NA_character_)
  if (any(is.na(xo))) xo <- xo[!is.na(xo)]
  length(xo) <- ncol(TableTest) - 1
  xo
  })))

Output

> TableTest
  X           Y           Z    S
1 1       India        <NA> <NA>
2 2       India      France <NA>
3 3 Philippines Netherlands <NA>
4 4 Netherlands      France <NA>
5 5      France       India <NA>

Upvotes: 1

M Turgeon

Reputation: 196

I don't think you can do it by only using data.frames, because you're moving values across columns. But here's one way to do it using matrices:

X <- c(1,2,3,4,5)
Y <- c("India","India","Philippines","Netherlands","France")
Z <- c("India","India","Netherlands","France","France")
S <- c("India","France","Netherlands","France","India")

output <- apply(cbind(Y,Z,S), 1, function(row) {
    rm_dup <- unique(row)
    return(c(rm_dup, rep(NA_character_,
                         3 - length(rm_dup))))
})

t(output)

     [,1]          [,2]          [,3]
[1,] "India"       NA            NA  
[2,] "India"       "France"      NA  
[3,] "Philippines" "Netherlands" NA  
[4,] "Netherlands" "France"      NA  
[5,] "France"      "India"       NA

Upvotes: 0

r2evans

Reputation: 160827

TableTest[,-1] <- as.data.frame(t(apply(TableTest[,-1], 1, function(a) { a <- replace(a, duplicated(a), NA_character_); a[ order(is.na(a)) ]; })))
TableTest
#   X           Y           Z    S
# 1 1       India        <NA> <NA>
# 2 2       India      France <NA>
# 3 3 Philippines Netherlands <NA>
# 4 4 Netherlands      France <NA>
# 5 5      France       India <NA>

Upvotes: 2

Finding duplicates by columns in r

Answers (4)

Related Questions