LittleFish
LittleFish

Reputation: 61

Get the unique values of two vectors keeping the order of both original

I am trying to get a vector of the unique elements of two vectors that respects the order of both of the original vectors.

The vectors are both sampled from a longer "hidden" vector that only contains unique entries (i.e. no repeats are allowed), which ensures both v1 and v2 have a compatible order (i.e. v1<-("Z","A",...) and v2<-("A","Z",...) can not occur).

The order is arbitrary, so I cannot use any simple order() or sort(). An example below:

v1 <- c("Z", "A", "F", "D")
v2 <- c("A", "T", "F", "Q", "D")

Result desired:

c("Z", "A", "T", "F", "Q", "D") or

Further explanation: v1 establishes the relationship "Z" < "A" < "F" < "D" and v2 states "A" < "T" < "F" < "Q" < "D" so the sequence that satisfies v1 and v2 is "Z" < "A" < "T" < "F" < "Q" < "D"

I understand this case is fully determined (the two vectors do completely define the order of all elements), but there would be cases when this is not enough. In that case, any permutation that respects the two sets of ordering would be a satisfactory solution.

Any tips will be appreciated.

Upvotes: 4

Views: 356

Answers (2)

GKi
GKi

Reputation: 39667

You can get unique from v1 and v2 and resort it using match on v1 and v2 and repeat this until no change happens.

x <- unique(c(v1, v2))
repeat {
  y <- x
  i <- match(v2, x)
  x[sort(i)] <- x[i]
  i <- match(v1, x)
  x[sort(i)] <- x[i]
  if(identical(x, y)) break;
}
x
#[1] "Z" "A" "T" "F" "Q" "D"

Alternative you can get the overlapping letters of v1 and v2 and then join to this anchor points the subsets of v1 and v2:

i <- v2[na.omit(match(v1, v2))]
j <- c(0, match(i, v2))
i <- c(0, match(i, v1))
unique(c(unlist(lapply(seq_along(i)[-1], function(k) {
  c(v1[head((i[k-1]:i[k]), -1)], v2[head((j[k-1]:j[k])[-1], -1)])
})), v1, v2))
#[1] "Z" "A" "T" "F" "Q" "D"

Upvotes: 4

iago
iago

Reputation: 3256

For this example the next code works. One first has to define auxiliar vectors w1, w2 depending on which has the first common element and another vector w on which to append the lacking elements by order.

It would be clearer using a for loop, which would avoid this cumbersome code, but at first, this is faster and shorter.

w <- w1 <- unlist(ifelse(intersect(v1,v2)[1] == v1[1], list(v2), list(v1)))
w2 <- unlist(ifelse(intersect(v1,v2)[1] == v1[1], list(v1), list(v2)))
unique(lapply(setdiff(w2,w1), function(elmt) w <<- append(w, elmt, after = match(w2[match(elmt,w2)-1],w)))[[length(setdiff(w2,w1))]])
[1] "Z" "A" "T" "F" "Q" "D"

Upvotes: 1

Related Questions