Reputation: 356
> read.delim("df.tsv")
col1 col2 group
1 3 2 aa
2 1 1 aa
3 4 1 aa
4 4 3 aa
5 5 3 ab
6 3 2 ab
7 4 1 ab
8 2 4 ab
9 4 2 ba
10 1 4 ba
11 3 1 ba
12 4 3 ba
13 4 2 bb
14 2 3 bb
15 3 1 bb
16 1 2 bb
I want to sort the columns col1 and col2 within each of the 4 groups, in the following way:
col1 col2 group
1 4 3 aa
2 3 2 aa
3 4 1 aa
4 1 1 aa
...
This could be accomplished by e.g. a "one row at a time" approach, first col1 and then col2, alternating for each row.
library(dplyr)
read.delim("df.tsv") %>%
group_by(group) %>%
arrange(ifelse(substr(group, 1,1) == "a", desc(col1), col1), # if first character in group name is "a", sort col1 in a descending manner, and ascending if it's "b"
ifelse(substr(group, 2,2) == "a", desc(col2), col2), # if second character in group name is also "a", sort also col2 in a descending manner, and ascending if it's "b"
.by_group = TRUE)
col1 col2 group
1 4 3 aa
2 4 1 aa
3 3 2 aa
4 1 1 aa
5 5 3 ab
6 4 1 ab
7 3 2 ab
8 2 4 ab
9 1 4 ba
10 3 1 ba
11 4 3 ba
12 4 2 ba
13 1 2 bb
14 2 3 bb
15 3 1 bb
16 4 2 bb
However, this does not fulfill the 3rd criterion, the "simultaneous sorting one row at a time".
col1 col2 group
1 4 3 aa
2 3 2 aa
3 4 1 aa
4 1 1 aa
5 5 3 ab
6 4 1 ab
7 3 2 ab
8 2 4 ab
9 1 4 ba
10 4 3 ba
11 3 1 ba
12 4 2 ba
13 1 2 bb
14 3 1 bb
15 2 3 bb
16 4 2 bb
There are a couple of answers that actually do the proposed task, so I think a tie-breaker could be that the algorithm is flexible with respect to the number of columns to sort, e.g. 3:
col1 col2 col3 group
3 2 4 aaa
1 1 2 aaa
4 1 4 aaa
4 3 1 aaa
5 3 3 aab
3 2 2 aab
4 1 1 aab
2 4 1 aab
4 2 3 aba
1 4 3 aba
3 1 2 aba
4 3 3 aba
3 2 4 abb
1 1 2 abb
4 1 4 abb
4 3 1 abb
4 2 1 baa
2 3 2 baa
3 1 2 baa
1 2 1 baa
5 3 3 bab
3 2 2 bab
4 1 1 bab
2 4 1 bab
4 2 3 bba
1 4 3 bba
3 1 2 bba
4 3 3 bba
4 2 1 bbb
2 3 2 bbb
3 1 2 bbb
1 2 1 bbb
The output should be
col1 col2 col3 group
4 3 1 aaa
3 2 4 aaa
4 1 4 aaa
1 1 2 aaa
5 3 3 aab
2 4 1 aab
4 1 1 aab
3 2 2 aab
4 2 3 aba
3 1 2 aba
4 3 3 aba
1 4 3 aba
4 1 4 abb
1 1 2 abb
4 3 1 abb
3 2 4 abb
1 2 1 baa
2 3 2 baa
3 1 2 baa
4 2 1 baa
2 4 1 bab
5 3 3 bab
4 1 1 bab
3 2 2 bab
1 4 3 bba
3 1 2 bba
4 2 3 bba
4 3 3 bba
1 2 1 bbb
3 1 2 bbb
4 2 1 bbb
2 3 2 bbb
Currently the 2 suggested solutions do not work when 3 or more columns are included, they sort based on only 2 columns.
If e.g. group=='aba', the first row of this group should be the one that includes the highest value in col1; the 2nd row the one that includes the (remaining) lowest value in col2; the 3rd row the one that includes the (remaining) highest value in col3, and the 4th row is the remaining row. However, this should be flexible to allow for more than 4 rows per group, in that case the 4th row should be the one that includes the (remaining) highest value in col1; the 5th row should be the one that includes the (remaining) lowest value in col2; etc.
Example: For the 2nd row of the 'aba' group, in the case that there is a tie between 2 rows for the lowest (remaining) value in col2, e.g.
row-a 3 1 4 aba
row-b 2 1 4 aba
(notice that there is a 1 in col2 in both rows), ideally then the chosen 2nd row would be row-a, since the col1 has to be sorted in a descending manner ('a') in this group, and 3>2, and for col3 4==4 anyway.
If instead
row-a 3 1 4 aba
row-b 2 1 5 aba
let the priority go col3>col2>col1, since the cycle goes col1>col2>col3... so the 2nd row would be row-b, since 5>4.
So to generalize, if there were 5 columns and the group were 'aabaa', and there is a tie for choosing the 3rd row between 2 rows:
row-a 3 2 1 3 3 aabaa
row-b 5 4 1 4 2 aabaa
(col3 == 1 in both), then the one to select would be row-a since for col5 3>2. If instead
row-a--> 3 2 1 3 3
row-b--> 5 4 1 4 3
(col5==3 in both), then choose row-b since for col4 4>3.
Upvotes: 3
Views: 255
Reputation: 8844
On second thought, I think I can just pass that option to you. You can specify any cycling method you want now.
alt_order <- function(..., type, cyc) {
cols <- unname(list(...))
stopifnot(
# sanity checks; you may skip if you think they are unnecessary
length(unique(lengths(cols))) == 1L,
length(cols) == length(type),
all(unlist(type) %in% c(1L, -1L))
)
cols <- mapply(`*`, cols, type, SIMPLIFY = FALSE)
out <- integer(length(cols[[1L]]))
this <- cols
for (i in seq_along(out)) {
out[[i]] <- do.call(order, this)[[1L]]
cols <- lapply(cols, `is.na<-`, out[[i]])
this <- cols[cyc(i)]
}
out
}
cyc
should be a function that accepts a single integer as input and returns a vector of integers. For example, if you have 3 columns and you want to replicate the rev
cycling behavior as I described in the comment below, you can do this
mycyc <- function(i) list(1:3, 3:1)[[(i - 1) %% 2L + 1L]]
df %>% group_by(group) %>% slice(alt_order(col1, col2, col3, type = ab2sign(group), cyc = mycyc))
Well, perhaps a not efficient but simple solution is to just sort the two columns continuously, swap the major column each time, and discharge the first element until no element is left to be sorted. Here is the function.
alt_order <- function(..., type) {
cols <- unname(list(...))
stopifnot(
# sanity checks; you may skip if you think they are unnecessary
length(unique(lengths(cols))) == 1L,
length(cols) == length(type),
all(unlist(type) %in% c(1L, -1L))
)
cols <- mapply(`*`, cols, type, SIMPLIFY = FALSE)
out <- integer(length(cols[[1L]]))
for (i in seq_along(out)) {
out[[i]] <- do.call(order, cols)[[1L]]
cols <- rev(lapply(cols, `is.na<-`, out[[i]]))
}
out
}
We assign values to NA
s to discharge them since NA
s will be sorted to the last in an ascending way. type
should be either 1 or -1 and is used to streamline the order we would like to impose since the descending order of c(1,2,3)
is the same as the ascending order of -1 * c(1,2,3)
. We also need a helper function as follows to transfer your group
s into 1 and -1
ab2sign <- function(x) {
out <- data.table::transpose(strsplit(x, "", fixed = TRUE))
lapply(out, function(x) 2L * (x == "b") - 1L)
}
Now we can apply them
df %>% group_by(group) %>% slice(alt_order(col1, col2, type = ab2sign(group)))
Output
# A tibble: 16 x 3
# Groups: group [4]
col1 col2 group
<int> <int> <chr>
1 4 3 aa
2 3 2 aa
3 4 1 aa
4 1 1 aa
5 5 3 ab
6 4 1 ab
7 3 2 ab
8 2 4 ab
9 1 4 ba
10 4 3 ba
11 3 1 ba
12 4 2 ba
13 1 2 bb
14 3 1 bb
15 2 3 bb
16 4 2 bb
Perhaps more efficient (perhaps vectorized) solutions are available.
Upvotes: 4
Reputation: 102770
Below is an option may work in general cases, i.e., more than 2 columns:
f <- function(.) {
col <- .[-length(.)] * (2 * (t(list2DF(strsplit(.$group, ""))) == "b") - 1)
r <- data.frame()
while (nrow(.)) {
p <- do.call(order, col[(seq_along(col) + nrow(r) - 1) %% length(col) + 1])[1]
r <- rbind(r, .[p, ])
col <- col[-p, ]
. <- .[-p, ]
}
r
}
df %>%
group_by(group) %>%
do(f(.)) %>%
ungroup()
which gives
col1 col2 col3 group
<int> <int> <int> <chr>
1 4 3 1 aaa
2 3 2 4 aaa
3 4 1 4 aaa
4 1 1 2 aaa
5 5 3 3 aab
6 2 4 1 aab
7 4 1 1 aab
8 3 2 2 aab
9 4 2 3 aba
10 3 1 2 aba
# ... with 22 more rows
Here is an option using dynamic programming (but maybe not that efficient)
f <- function(.) {
col <- with(., data.frame(col1, col2) * (2 * (t(list2DF(strsplit(.$group, ""))) == "b") - 1))
r <- data.frame()
while (nrow(.)) {
p <- do.call(order, ifelse(nrow(r) %% 2, rev, I)(col))[1]
r <- rbind(r, .[p, ])
col <- col[-p,]
. <- .[-p, ]
}
r
}
df %>%
group_by(group) %>%
do(f(.)) %>%
ungroup()
which gives
# A tibble: 16 x 3
col1 col2 group
<int> <int> <chr>
1 4 3 aa
2 3 2 aa
3 4 1 aa
4 1 1 aa
5 5 3 ab
6 4 1 ab
7 3 2 ab
8 2 4 ab
9 1 4 ba
10 4 3 ba
11 3 1 ba
12 4 2 ba
13 1 2 bb
14 3 1 bb
15 2 3 bb
16 4 2 bb
Upvotes: 2
Reputation: 5
I can tell you the answer, but I cannot write complete r
code for it, since I don't know r
, I hope some one may edit my code for a complete answer.
suppose both sorts are ascending (you can generalize it to your case)
idx1=order(col1)
idx2=order(col2[idx1])
return col1[idx1[idx2]], col2[idx1[idx2]]
Upvotes: -2