Noor
Noor

Reputation: 365

How can I remove shared values from a list of vectors

I have a list :

x <- list("a" = c(1:6,32,24) , "b" = c(1:4,8,10,12,13,17,24), 
          "F" = c(1:5,9:15,17,18,19,20,32))
x

$a
[1]  1  2  3  4  5  6 32 24

$b
[1]  1  2  3  4  8 10 12 13 17,24

$F
[1]  1  2  3  4  5  9 10 11 12 13 14 15 17 18 19 20 32

Each vector in the list shares a number of elements with others. How I can remove shared values to get the following result?

 $a
    [1]  1  2  3  4  5  6 32 24

    $b
    [1]  8 10 12 13 17

    $F
    [1]   9  11  14 15 18 19 20

As you can see: the first vector does not change. The shared elements between first and second vectors will be removed from the second vector, and then we will remove the shared elements from third vectors after comparing it with first and second vectors. The target of this task is clustering dataset (the original data set contains 590 objects).

Upvotes: 1

Views: 193

Answers (2)

James
James

Reputation: 66834

You can use Reduce and setdiff on the list in the reverse order to find all elements of the last vector that do not appear in the others. Bung this into an lapply to run over partial sub-lists to get your desired output:

lapply(seq_along(x), function(y) Reduce(setdiff,rev(x[seq(y)])))
[[1]]
[1]  1  2  3  4  5  6 32 24

[[2]]
[1]  8 10 12 13 17

[[3]]
[1]  9 11 14 15 18 19 20

When scaling up, the number of rev calls may become an issue, so you might want to reverse the list once, outside the lapply as a new variable, and subset that within it.

Upvotes: 5

Ben Bolker
Ben Bolker

Reputation: 226097

x <- list("a" = c(1:6,32,24) , 
          "b" = c(1:4,8,10,12,13,17,24), 
          "F" = c(1:5,9:15,17,18,19,20,32))

This is inefficient since it re-makes the union of the previous set of lists at each step (rather than keeping a running total), but it was the first way I thought of.

for (i in 2:length(x)) {
   ## construct union of all previous lists
   prev <- Reduce(union,x[1:(i-1)])
   ## remove shared elements from the current list
   x[[i]] <- setdiff(x[[i]],prev)
}  

You could probably improve this by initializing prev as numeric(0) and making prev into c(prev,x[i-1]) at each step (although this grows a vector at each step, which is a slow operation). If you don't have a gigantic data set/don't have to do this operation millions of times it's probably good enough.

Upvotes: 1

Related Questions