Reputation: 8626
Let's say I have the following data.table:
set.seed(123)
dt <- data.table (id=1:10,
group=sample(LETTERS[1:3], 10, replace=TRUE),
val=sample(1:100, 10, replace=TRUE),
ltr=sample(letters, 10),
col5=sample(100:200, 10)
)
setkey(dt, id)
(dt)
# id group val ltr col5
# 1: 1 A 96 x 197
# 2: 2 C 46 r 190
# 3: 3 B 68 p 168
# 4: 4 C 58 w 177
# 5: 5 C 11 o 102
# 6: 6 A 90 v 145
# 7: 7 B 25 k 172
# 8: 8 C 5 l 120
# 9: 9 B 33 f 129
# 10: 10 B 96 c 121
now I want to process it with grouping by group
, and in each group I would need to order records by val
column and then do some manipulations within each ordered group (for example, add a column with values from ltr
merged in order):
# id group val ltr letters
# 1 6 A 90 v v_x
# 2 1 A 96 x v_x
# 3 7 B 25 k k_f_p_c
# 4 9 B 33 f k_f_p_c
# 5 3 B 68 p k_f_p_c
# 6 10 B 96 c k_f_p_c
# 7 8 C 5 l l_o_r_w
# 8 5 C 11 o l_o_r_w
# 9 2 C 46 r l_o_r_w
# 10 4 C 58 w l_o_r_w
(in this example the whole table is ordered but this is not required)
That's how I imagine the code in general:
dt1 <- dt[,
{
# processing here, reorder somehow
# ???
# ...
list(id=id, ltr=ltr, letters=paste0(ltr,collapse="_"))
},
by=group]
Thanks in advance for any ideas!
UPD. As noted in answers, for my example I can simply order by group
and then by val
. And if I need to do several different orderings? For example, I want to sort by col5
and add col5diff
column which will show the difference of col5
values:
# id group val ltr col5 letters col5diff
# 1: 6 A 90 v 145 v_x
# 2: 1 A 96 x 197 v_x 52
# 3: 10 B 96 c 121 k_f_p_c
# 4: 9 B 33 f 129 k_f_p_c 8
# 5: 3 B 68 p 168 k_f_p_c 47
# 6: 7 B 25 k 172 k_f_p_c 51
# 7: 5 C 11 o 102 l_o_r_w
# 8: 8 C 5 l 120 l_o_r_w 18
# 9: 4 C 58 w 177 l_o_r_w 75
#10: 2 C 46 r 190 l_o_r_w 88
ok, for this example calculations of letters
and col5diff
are independent, so I can simply do them consecutively:
setkey(dt, "group", "val")
dt[, letters := paste(ltr, collapse="_"), by = group]
setkey(dt, "group", "col5")
dt<-dt[, col5diff:={
diff <- NA;
for (i in 2:length(col5)) {diff <- c(diff, col5[i]-col5[1]);}
diff; # updated to use := instead of list - thanks to comment of @Frank
}, by = group]
but I would be also glad to know what to do if I would need to use both of these orderings (in single {}
block).
Upvotes: 1
Views: 330
Reputation: 118779
Unless I'm missing something, this just requires setting the key
of your data.table
to group
and val
:
setkey(dt, "group", "val")
# id group val ltr col5
# 1: 6 A 90 v 145
# 2: 1 A 96 x 197
# 3: 7 B 25 k 172
# 4: 9 B 33 f 129
# 5: 3 B 68 p 168
# 6: 10 B 96 c 121
# 7: 8 C 5 l 120
# 8: 5 C 11 o 102
# 9: 2 C 46 r 190
# 10: 4 C 58 w 177
You see that the values are automatically ordered. Now you can subset by group
:
dt[, letters := paste(ltr, collapse="_"), by = group]
# id group val ltr col5 letters
# 1: 6 A 90 v 145 v_x
# 2: 1 A 96 x 197 v_x
# 3: 7 B 25 k 172 k_f_p_c
# 4: 9 B 33 f 129 k_f_p_c
# 5: 3 B 68 p 168 k_f_p_c
# 6: 10 B 96 c 121 k_f_p_c
# 7: 8 C 5 l 120 l_o_r_w
# 8: 5 C 11 o 102 l_o_r_w
# 9: 2 C 46 r 190 l_o_r_w
# 10: 4 C 58 w 177 l_o_r_w
Upvotes: 1
Reputation: 49810
I think you're just looking for order
dt[, letters:=paste(ltr[order(val)], collapse="_"), by=group]
dt[order(group, val)]
# id group val ltr col5 letters
# 1: 6 A 90 v 145 v_x
# 2: 1 A 96 x 197 v_x
# 3: 7 B 25 k 172 k_f_p_c
# 4: 9 B 33 f 129 k_f_p_c
# 5: 3 B 68 p 168 k_f_p_c
# 6: 10 B 96 c 121 k_f_p_c
# 7: 8 C 5 l 120 l_o_r_w
# 8: 5 C 11 o 102 l_o_r_w
# 9: 2 C 46 r 190 l_o_r_w
#10: 4 C 58 w 177 l_o_r_w
Or, if you do not want to add a column by reference:
dt[, list(id, val, ltr, letters=paste(ltr[order(val)], collapse="_")),
by=group][order(group, val)]
# group id val ltr letters
# 1: A 6 90 v v_x
# 2: A 1 96 x v_x
# 3: B 7 25 k k_f_p_c
# 4: B 9 33 f k_f_p_c
# 5: B 3 68 p k_f_p_c
# 6: B 10 96 c k_f_p_c
# 7: C 8 5 l l_o_r_w
# 8: C 5 11 o l_o_r_w
# 9: C 2 46 r l_o_r_w
#10: C 4 58 w l_o_r_w
Upvotes: 2