Reputation: 763
I am trying to use data.table
for an operation I don't if data.table
is designed for. Suppose I have the following
set.seed(1)
id<-rep(LETTERS[1:6], each=3)
event<-rep(letters[1:4], each=3, length.out=18)
order<-rep(1:3, 6)
var<-rnorm(18)
dt<-data.table(id, event, order, var)
setkey(dt, id)
id event order var
1: A a 1 -0.62645381
2: A a 2 0.18364332
3: A a 3 -0.83562861
4: B b 1 1.59528080
5: B b 2 0.32950777
6: B b 3 -0.82046838
7: C c 1 0.48742905
8: C c 2 0.73832471
9: C c 3 0.57578135
10: D d 1 -0.30538839
11: D d 2 1.51178117
12: D d 3 0.38984324
13: E a 1 -0.62124058
14: E a 2 -2.21469989
15: E a 3 1.12493092
16: F b 1 -0.04493361
17: F b 2 -0.01619026
18: F b 3 0.94383621
and I need to calculate in a new column the Euclidean distance between the values in var
having different id
but same event
type, maintaining the given order
of observations.
I understood that with the grouping by
command in the data.table
statement I divide the data into chunks which I can use in connection with other chunks from other data.table
s but can I do the same thing within the same data.table
?
To be clear, this is what I'd like to obtain
id event order var euclid
1: A a 1 -0.62645381 3.097720
2: A a 2 0.18364332 3.097720
3: A a 3 -0.83562861 3.097720
4: B b 1 1.59528080 2.433635
5: B b 2 0.32950777 2.433635
6: B b 3 -0.82046838 2.433635
7: C c 1 0.48742905 NA
8: C c 2 0.73832471 NA
9: C c 3 0.57578135 NA
10: D d 1 -0.30538839 NA
11: D d 2 1.51178117 NA
12: D d 3 0.38984324 NA
13: E a 1 -0.62124058 3.097720
14: E a 2 -2.21469989 3.097720
15: E a 3 1.12493092 3.097720
16: F b 1 -0.04493361 2.433635
17: F b 2 -0.01619026 2.433635
18: F b 3 0.94383621 2.433635
Many thanks!
Upvotes: 3
Views: 181
Reputation: 93813
Not sure about efficiency, but this should work at a basic level:
dt[, euclid:= dist(xtabs(var ~ id + order, data=.SD)), by=event]
# id event order var euclid
# 1: A a 1 -0.62645381 3.097720 # snip
# 4: B b 1 1.59528080 2.433635 # snip
# 7: C c 1 0.48742905 NA # snip
#10: D d 1 -0.30538839 NA # snip
#13: E a 1 -0.62124058 3.097720 # snip
#16: F b 1 -0.04493361 2.433635 # snip
To account for @Arun's issue where there are 3+ groups, you can sum the dist
outputs, though you will get 0 instead of NA's where there is only one group:
dt[, euclid := sum(dist(xtabs(var ~ id + order, data=.SD))), by=event]
Upvotes: 1