Reputation: 69
I'm trying to organize a sequence of data according to a given sequence. For example, the given sequence I have is
set.seed(1)
given_seq <- sample(rep(1:3,2))
The data and its associated sequence
dat_seq <- rep(1:3,2)
dat_value <- rnorm(6)
And I want to organize the data according to the given sequence. That is, 1,2,3 serve as a function of labels of data. For example,
dat_value
[1] 1.5952808 0.3295078 -0.8204684 0.4874291 0.7383247 0.5757814
dat_seq
[1] 1 2 3 1 2 3
given_seq
[1] 2 3 3 1 1 2
Then I expect the second and fifth data values (with label 2) are placed at first or sixth places.
I can see that the organized sequence is not unique, but I'm not sure how to do this.
Upvotes: 1
Views: 971
Reputation: 48221
Here's another option:
dat_value[match(rank(given_seq, ties = "random"), rank(dat_seq, ties = "random"))]
# [1] 0.7383247 0.5757814 -0.8204684 1.5952808 0.4874291 0.3295078
First we convert the two sequences into ones that have no repetitive elements; e.g.,
rank(given_seq, ties = "random")
# [1] 3 5 6 1 2 4
That is, if two entries of given_seq
are, say, (1,1), then they will randomly be converted into (1,2) or (2,1). The same is done with dat_seq
and, consequently, we can match them and reorder dat_value
accordingly. Thus, this kind of method would give you some randomization, which may or may not be something desirable in your application.
Upvotes: 2
Reputation: 1830
This also works, probably even faster, although it may be harder to understand
dat_value[order(dat_seq)][order(order(given_seq))]
First, we re-order dat_value so that it's corresponding to the sequence c(1,1,2,2,3,3)
.
Then we go for the desired order, which would be given_seq
if that was sequential. Fortunately, twice calling order just makes it sequential.
Upvotes: 1
Reputation: 145965
I would just make the labels unique and use the names
attribute normally.
names(dat_value) = make.unique(as.character(dat_seq))
dat_value[make.unique(as.character(given_seq))]
# 2 3 3.1 1 1.1 2.1
# 0.3295078 -0.8204684 0.5757814 1.5952808 0.4874291 0.7383247
You can always strip the names off later if the non-uniqueness doesn't work for your use case.
Upvotes: 1