Reputation: 581
I am searching for a simple dplyr or data.table solution. I need to sort rows of a large data frame, but only have a solution with for loops.
Here is a minimum example:
A = c('A1', 'A2', 'A3', 'A4', 'A5')
B = c('B1', 'B2', 'B3')
set.seed(20)
df = data.frame(col1 = sample(c(A,B),8,1), col2 = sample(c(A,B),8,1), col3 = sample(c(A,B),8,1))
col1 col2 col3
1 B1 B1 A1
2 B2 B1 A5
3 A3 A5 B1
4 B3 B2 B3
5 A2 B2 A2
6 A1 A1 B2
7 A2 A3 A4
8 A5 A5 A1
The expected output should be:
col1 col2 col3
1 B1 A1 B1
2 B1 A5 B2
3 B1 A3 A5
4 B2 B3 B3
5 B2 A2 A2
6 B2 A1 A1
7 A2 A3 A4
8 A1 A5 A5
So, the order of the rows for the sort algorithm is c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5')
with one exception. If there is already one of the B's in the first column we continue with the A's.
The next problem is, that I have three more columns in the data frame with different numbers which should be rearranged in the same order as these three columns.
Upvotes: 2
Views: 113
Reputation: 39858
Might be more than a little bit too convoluted, but a dplyr
and purrr
option might be:
map2_dfr(.x = df %>%
group_split(cond = as.numeric(grepl("^B", col1))),
.y = list(vec1, vec2),
~ .x %>%
mutate(pmap_dfr(across(c(starts_with("col"), - pluck(select(.x, "cond"), 1))),
function(...) set_names(c(...)[order(match(c(...), .y))], names(c(...))))))
col1 col2 col3 cond
<chr> <chr> <chr> <dbl>
1 B1 A3 A5 0
2 B2 A2 A2 0
3 B2 A1 A1 0
4 A2 A3 A4 0
5 A1 A5 A5 0
6 B1 A1 B1 1
7 B2 A5 B1 1
8 B3 B2 B3 1
Upvotes: 2
Reputation: 51914
You can use apply
, factor
and sort
twice with different orders.
order1 = c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5') #Main order
order2 = c('A1', 'A2', 'A3', 'A4', 'A5', 'B1', 'B2', 'B3') #Secondary order for rows with 1st column as "B"
startB <- grepl("B", df[, 1]) #Rows with 1st column being "B"
df <- data.frame(t(apply(df, 1, \(x) sort(factor(x, levels = order1)))))
df[startB, -1] <- t(apply(df[startB, ], 1, \(x) sort(factor(x[-1], levels = order2))))
output
X1 X2 X3
1 B1 A1 B1
2 B1 A5 B2
3 B1 A3 A5
4 B2 B3 B3
5 B2 A2 A2
6 B2 A1 A1
7 A2 A3 A4
8 A1 A5 A5
Upvotes: 2
Reputation: 581
My solution so far:
A = c('A1', 'A2', 'A3', 'A4', 'A5')
B = c('B1', 'B2', 'B3')
set.seed(100)
N = 20
df_1 = data.frame(col1 = sample(c(A,B),N,1), col2 = sample(c(A,B),N,1), col3 = sample(c(A,B),N,1))
vec = c('B1', 'B2', 'B3', 'A1', 'A2', 'A3', 'A4', 'A5')
df_2 = t(apply(df_1,1,function(x)match(x,vec)))
df_3 = t(apply(df_2,1,sort))
tr = rowSums(matrix(df_3 %in% c(1,2,3),nrow(df_3), ncol(df_3))) == 2
change = which((df_3[,2]*tr)!=0)
save = df_3[change,2]
df_3[change,2] = df_3[change,3]
df_3[change,3] = save
df_4 = matrix(vec[df_3],nrow(df_3),ncol(df_3))
from df_2
to df_3
the place of the number is changing and I can rearrange the other columns by that.
Looks a little bit complicated
Upvotes: 1