Reputation: 45
I am currently trying to combine information from two dfs of eye tracking data. In one df(behavioral), there are the start and end times associated with each trial in the experiment. In the other df(gaze), there is a timestamp of the gaze that was recorded. I want to go through each gaze timestamp and assess whether or not it is within the start and end times of a trial (information drawn from the behavioral df), and if so, add the trial information from the behavioral df to the Trial column within the gaze df.
The dfs are as follows:
Behavioral df
StartTime EndTime Trial
1: 0 0.8 a
2: 1 1.8 b
3: 2 2.8 c
4: 3 3.8 d
Gaze df
Gaze x y Frame Trial
1: 0.00 100 200 126 NA
2: 0.20 101 201 126 NA
3: 0.40 102 202 127 NA
4: 0.80 103 203 127 NA
5: 0.60 104 204 127 NA
6: 0.90 105 205 127 NA
7: 1.20 106 206 128 NA
8: 1.40 107 207 128 NA
9: 1.60 108 208 128 NA
10: 2.02 109 209 129 NA
11: 2.50 110 210 129 NA
12: 2.90 111 211 129 NA
13: 3.10 112 212 130 NA
14: 3.79 113 213 130 NA
I would want to go though the gaze time stamps. Ie, for Gaze$Gaze[1]
, is it between 0 and 0.8? Yes >>> Gaze$Trial[1]=a
I have tried
for(i in Gaze$Gaze){
if(as.numeric(Gaze$Gaze[i]) >= as.numeric(Behavior$StartTime[i])){
if(as.numeric(Gaze$Gaze[i]) <= as.numeric(Behavior$EndTime[i])){
Gaze$Trial[i]<-Behavior$Trial[i]
}
}
else Gaze$Trial[i]<-NA
}
I get the error:
Error in if (as.numeric(fakegaze$Gaze[i]) >= as.numeric(fakebehavior$StartTime[i])) { : argument is of length zero
I believe I might need to use another for loop to iterate through the two dfs separately before merging the information, but I'm not sure where to start. Thanks!
Data:
library(data.table)
beh = setDT(structure(list(StartTime = c(0, 1, 2, 3), EndTime = c(0.8, 1.8, 2.8, 3.8
), Trial = c("a", "b", "c", "d")), row.names = c(NA, -4L), class = "data.frame"))
gaze = setDT(structure(list(Gaze = c(0, 0.2, 0.4, 0.8, 0.6, 0.9, 1.2, 1.4,
1.6, 2.02, 2.5, 2.9, 3.1, 3.79), x = 100:113, y = 200:213, Frame = c(126L,
126L, 127L, 127L, 127L, 127L, 128L, 128L, 128L, 129L, 129L, 129L,
130L, 130L), Trial = c(NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA)), row.names = c(NA, -14L), class = "data.frame"))
Upvotes: 3
Views: 68
Reputation: 66819
You can use a non-equi join to update Trial in the gaze table:
gaze[, Trial := beh[.SD, on=.(StartTime <= Gaze, EndTime >= Gaze), x.Trial]]
Gaze x y Frame Trial
1: 0.00 100 200 126 a
2: 0.20 101 201 126 a
3: 0.40 102 202 127 a
4: 0.80 103 203 127 a
5: 0.60 104 204 127 a
6: 0.90 105 205 127 <NA>
7: 1.20 106 206 128 b
8: 1.40 107 207 128 b
9: 1.60 108 208 128 b
10: 2.02 109 209 129 c
11: 2.50 110 210 129 c
12: 2.90 111 211 129 <NA>
13: 3.10 112 212 130 d
14: 3.79 113 213 130 d
This approach assumes that there are no overlapping intervals in beh
(in which case the right Trial could be ambiguous).
(OP didn't tag the question with data.table or include the library(data.table)
call, but I'm assuming they're using it based on how the tables were printed.)
As a workaround for the .SD is locked
error bug, I usually use copy(.SD)
as recommended in the error message. However, as the OP pointed out in the comments, this can be expensive with large data. An alternative that is usually equivalent is to flip the join around:
# convert to correct NA type
gaze[, Trial := rep(beh$Trial[NA_integer_], .N)]
# reversed update join
gaze[beh, on=.(Gaze >= StartTime, Gaze <= EndTime), Trial := i.Trial]
For the OP's case, it still seems to produce the right result. I usually avoid this kind of join because I find it harder to read and it can have strange side effects. In particular, in x[i, on=, v := i.v]
if multiple rows of i
map to the same row of x
, only the last matching row will be used (with no warning or error).
Upvotes: 1