Ashley
Ashley

Reputation: 69

How to sort a vector according to a given sequence in R

I'm trying to organize a sequence of data according to a given sequence. For example, the given sequence I have is

set.seed(1)
given_seq <- sample(rep(1:3,2))

The data and its associated sequence

dat_seq <- rep(1:3,2)
dat_value <- rnorm(6)

And I want to organize the data according to the given sequence. That is, 1,2,3 serve as a function of labels of data. For example,

dat_value
[1]  1.5952808  0.3295078 -0.8204684  0.4874291  0.7383247  0.5757814

dat_seq
[1] 1 2 3 1 2 3

given_seq
[1] 2 3 3 1 1 2

Then I expect the second and fifth data values (with label 2) are placed at first or sixth places.

I can see that the organized sequence is not unique, but I'm not sure how to do this.

Upvotes: 1

Views: 971

Answers (3)

Julius Vainora
Julius Vainora

Reputation: 48221

Here's another option:

dat_value[match(rank(given_seq, ties = "random"), rank(dat_seq, ties = "random"))]
# [1]  0.7383247  0.5757814 -0.8204684  1.5952808  0.4874291  0.3295078

First we convert the two sequences into ones that have no repetitive elements; e.g.,

rank(given_seq, ties = "random")
# [1] 3 5 6 1 2 4

That is, if two entries of given_seq are, say, (1,1), then they will randomly be converted into (1,2) or (2,1). The same is done with dat_seq and, consequently, we can match them and reorder dat_value accordingly. Thus, this kind of method would give you some randomization, which may or may not be something desirable in your application.

Upvotes: 2

Emil Bode
Emil Bode

Reputation: 1830

This also works, probably even faster, although it may be harder to understand

dat_value[order(dat_seq)][order(order(given_seq))]

First, we re-order dat_value so that it's corresponding to the sequence c(1,1,2,2,3,3).
Then we go for the desired order, which would be given_seq if that was sequential. Fortunately, twice calling order just makes it sequential.

Upvotes: 1

Gregor Thomas
Gregor Thomas

Reputation: 145965

I would just make the labels unique and use the names attribute normally.

names(dat_value) = make.unique(as.character(dat_seq))
dat_value[make.unique(as.character(given_seq))]
 #         2          3        3.1          1        1.1        2.1 
 # 0.3295078 -0.8204684  0.5757814  1.5952808  0.4874291  0.7383247 

You can always strip the names off later if the non-uniqueness doesn't work for your use case.

Upvotes: 1

Related Questions