How can I remove shared values from a list of vectors

Question

I have a list :

x <- list("a" = c(1:6,32,24) , "b" = c(1:4,8,10,12,13,17,24), 
          "F" = c(1:5,9:15,17,18,19,20,32))
x

$a
[1]  1  2  3  4  5  6 32 24

$b
[1]  1  2  3  4  8 10 12 13 17,24

$F
[1]  1  2  3  4  5  9 10 11 12 13 14 15 17 18 19 20 32

Each vector in the list shares a number of elements with others. How I can remove shared values to get the following result?

 $a
    [1]  1  2  3  4  5  6 32 24

    $b
    [1]  8 10 12 13 17

    $F
    [1]   9  11  14 15 18 19 20

As you can see: the first vector does not change. The shared elements between first and second vectors will be removed from the second vector, and then we will remove the shared elements from third vectors after comparing it with first and second vectors. The target of this task is clustering dataset (the original data set contains 590 objects).

Ben Bolker · Accepted Answer

x <- list("a" = c(1:6,32,24) , 
          "b" = c(1:4,8,10,12,13,17,24), 
          "F" = c(1:5,9:15,17,18,19,20,32))

This is inefficient since it re-makes the union of the previous set of lists at each step (rather than keeping a running total), but it was the first way I thought of.

for (i in 2:length(x)) {
   ## construct union of all previous lists
   prev <- Reduce(union,x[1:(i-1)])
   ## remove shared elements from the current list
   x[[i]] <- setdiff(x[[i]],prev)
}

You could probably improve this by initializing prev as numeric(0) and making prev into c(prev,x[i-1]) at each step (although this grows a vector at each step, which is a slow operation). If you don't have a gigantic data set/don't have to do this operation millions of times it's probably good enough.

How can I remove shared values from a list of vectors

Answers (2)

Related Questions