istibar
istibar

Reputation: 33

R: reorder a data frame with groups while preserving order within groups

R coders! I have a data frame, plan, with two columns. One column has group labels, lab, and the other, tr has only two distinct values in it.

lab <- rep(letters[1:2], each = 4)
tr <- c(1, 2, 2, 1, 1, 2, 1, 2)
plan <- data.frame(lab = lab, tr = tr)

> plan
  lab tr
1   a  1
2   a  2
3   a  2
4   a  1
5   b  1
6   b  2
7   b  1
8   b  2

I have another vector, order_new, which is a reordered version of lab.

order_new <- lab[sample(1:8)]

> order_new
[1] "b" "b" "a" "a" "b" "a" "b" "a"

I want to reorder the data frame above so the tr values are sorted in the order given by order_new but with the order within the original lab groups preserved. The result I want is:

plan_new <- data.frame(order_new = order_new, tr = c(1, 2, 1, 2, 1, 2, 2, 1))
 
> plan_new
  order_new tr
1         b  1
2         b  2
3         a  1
4         a  2
5         b  1
6         a  2
7         b  2
8         a  1

The first row in the new data frame is a "b" value and so takes the first "b" value in the original data frame. Row 2, also a "b", takes the second "b" value in the original. The third row, an "a", takes the first "a" value in the original etc.

I can't find anything close enough in past answers to work this out and am really looking forward to someone helping me out with this!

Upvotes: 0

Views: 129

Answers (2)

user2974951
user2974951

Reputation: 10375

If you don't mind a loop

order_new=c("b", "b", "a", "a", "b", "a", "b", "a")

tmp=split(plan$tr,plan$lab)

res=list()
for (x in 1:length(order_new)) {
  res[[x]]=tmp[[order_new[x]]][1]
  tmp[[order_new[x]]]=tail(tmp[[order_new[x]]],-1)
}

data.frame(
  "lab"=order_new,
  "tr"=unlist(res)
)

  lab tr
1   b  1
2   b  2
3   a  1
4   a  2
5   b  1
6   a  2
7   b  2
8   a  1

Upvotes: 1

Wimpel
Wimpel

Reputation: 27732

Here is a data.table approach of things.. can easily be tinkerd into a dplyr or baseR solution, followint the same logic.. I included all intermediate results to show you the results of each line..

lab <- rep(letters[1:2], each = 4)
tr <- c(1, 2, 2, 1, 1, 2, 1, 2)
plan <- data.frame(lab = lab, tr = tr)
#hard coded, since sample is not reproducible without set.seed()
order_new <- c("b", "b", "a", "a", "b", "a", "b", "a")

library( data.table )
#make plan a data.table
setDT(plan)
#set row_id's by grope (lab)
plan[, row_id := rowid( lab ) ]
#    lab tr row_id
# 1:   a  1      1
# 2:   a  2      2
# 3:   a  2      3
# 4:   a  1      4
# 5:   b  1      1
# 6:   b  2      2
# 7:   b  1      3
# 8:   b  2      4

#make a new data.table for the new ordering
plan_new <- data.table( order_new = order_new )
#also add rownumbers by group
plan_new[, row_id := rowid( order_new ) ][]
#    order_new row_id
# 1:         b      1
# 2:         b      2
# 3:         a      1
# 4:         a      2
# 5:         b      3
# 6:         a      3
# 7:         b      4
# 8:         a      4

#now join the tr-value from data.table 'plan' to 'plkan2', based on the rowid
plan_new[ plan, tr := i.tr, on = .(order_new = lab, row_id) ]
#    order_new row_id tr
# 1:         b      1  1
# 2:         b      2  2
# 3:         a      1  1
# 4:         a      2  2
# 5:         b      3  1
# 6:         a      3  2
# 7:         b      4  2
# 8:         a      4  1

#drop the row_id column if needed
plan_new[, row_id := NULL ][]
#    order_new tr
# 1:         b  1
# 2:         b  2
# 3:         a  1
# 4:         a  2
# 5:         b  1
# 6:         a  2
# 7:         b  2
# 8:         a  1

Upvotes: 1

Related Questions