Reputation: 117
I have a dataframe that has a series of ID characters (trt,individual, and session):
> trt<-c(rep("A",3),rep("B",3),rep("C",3),rep("A",3),rep("B",3),rep("C",3),rep("A",3),rep("B",3),rep("C",3))
individual<-rep(c("Bob","Nancy","Tim"),9)
session<-c(1,1,1,2,2,2,3,3,3,4,4,4,5,5,5,6,6,6,7,7,7,8,8,8,9,9,9)
data<-rnorm(27,mean=4,sd=1)
df<-as.data.frame(cbind(trt,individual,session,data))
df
trt individual session data
1 A Bob 1 4.36604594311893
2 A Nancy 1 3.29568979189961
3 A Tim 1 3.55849387209243
4 B Bob 2 5.41661201729216
5 B Nancy 2 4.7158873476798
6 B Tim 2 5.34401708530548
7 C Bob 3 4.54277206331273
8 C Nancy 3 3.53976115781019
9 C Tim 3 3.7954788384957
10 A Bob 4 4.75145309337952
11 A Nancy 4 4.7995601464568
12 A Tim 4 3.17821205815185
13 B Bob 5 3.62379779744325
14 B Nancy 5 4.07387328854209
15 B Tim 5 5.60156909861945
16 C Bob 6 4.06727142161431
17 C Nancy 6 4.59940289933985
18 C Tim 6 3.07543217234973
19 A Bob 7 2.63468285023662
20 A Nancy 7 3.22650587327078
21 A Tim 7 6.31062631711196
22 B Bob 8 4.69047076193906
23 B Nancy 8 4.79190101388308
24 B Tim 8 1.61906440409175
25 C Bob 9 2.85180524036416
26 C Nancy 9 3.43304058627408
27 C Tim 9 4.89263600498695
I am looking to create a new dataframe where I have randomly pulled each trtxindividual combination but under the constraint that each unique session number is only selected once
This is what I want my dataframe to look like:
trt individual session data
2 A Nancy 1 3.29568979189961
4 B Bob 2 5.41661201729216
9 C Tim 3 3.7954788384957
10 A Bob 4 4.75145309337952
15 B Tim 5 5.60156909861945
17 C Nancy 6 4.59940289933985
21 A Tim 7 6.31062631711196
23 B Nancy 8 4.79190101388308
25 C Bob 9 2.85180524036416
I know how to randomly select a subset of each trtxindividual combination:
> setDT(df)
newdf<-df[, .SD[sample(.N, 1)] , by=.(trt, individual)]
newdf
trt individual session data
1: A Bob 4 4.75145309337952
2: A Nancy 1 3.29568979189961
3: A Tim 7 6.31062631711196
4: B Bob 8 4.69047076193906
5: B Nancy **2** 4.7158873476798
6: B Tim **2** 5.34401708530548
7: C Bob 6 4.06727142161431
8: C Nancy 9 3.43304058627408
9: C Tim 3 3.7954788384957
But I dont know how to restrict the pulls to only allow one session to be pulled (aka not allow duplicates as there are above)
Thanks in advance for your help!
Upvotes: 3
Views: 103
Reputation: 4357
This will need to iterate through the data.table
and might not be quick, but it doesn't require setting any parameters for the fields of interest
library(data.table)
set.seed(7)
setDT(df)
dt1 <- df[, .SD[sample(.N)]]
dt1[, i := .I]
dt1[, flag := NA]
setkey(dt1, flag)
lapply(dt1$i, function(x) {
dt1[is.na(flag[x]) & (trt == trt[x] & individual == individual[x] | session == session[x]), flag := i == x]
})
dt1[flag == TRUE, ]
trt individual session data i flag
1: C Tim 9 3.63712332100071 1 TRUE
2: A Nancy 4 4.54908662150973 2 TRUE
3: A Tim 1 5.84217708521442 3 TRUE
4: B Tim 2 2.37343483362789 5 TRUE
5: C Nancy 3 2.87792051390258 7 TRUE
6: A Bob 7 3.45471592963754 12 TRUE
7: B Nancy 8 4.54792567807183 15 TRUE
8: C Bob 6 4.45667777212948 24 TRUE
9: B Bob 5 2.33285598638319 27 TRUE
Upvotes: 1