Richard Erickson
Richard Erickson

Reputation: 2616

vectorizeing for loop with data.table when comparing across multiple rows

In a nutshell, I am trying to vectorize my data.table code and remove 2 for loops. Specifically, I am comparing two different rows and cannot figure out how to vectorize my code. Here's the details:

I am trying to count the number of times fish move across a line given the fish's coordinates. I only care about 1 way movement (e.g., north-to-south but not south-to-north). The actual data is two dimensional and has hundreds of thousands of observations. I have created a one dimensional, reproducible example.

I have looked through the data.table FAQ and searched through SO using "vectorize data.table". If I am not "asking the right question" (i.e., searching with the correct terms), I would appropriate pointers on what I should be searching for to solve my problem.

Here's my example and what I am currently doing:

library(data.table)
dt = data.table(
    fish = rep(c("a", "b"), each = 4),
    time = rep(c(1:4),  2),
    location = c(1, 1, 2, 2, 1, 1, 1, 1))

crossLine = 1.5 # Coordinates that I care about
dt[ , Cross := 0] ## did the fish cross the line during the previous time step?

fishes = dt[ , unique(fish)]

for(fishIndex in fishes){ # loop through each fish
    sampleTime = dt[ fishIndex == fish, time]
    nObs = length(sampleTime)
    ## In the real dataset, the no. of observations varies by fish
    for(timeIndex in 1:(nObs - 1)){ #loop through each time point
      if(dt[ fishIndex     == fish  & sampleTime[timeIndex] == time, 
             location <=  crossLine] &  
         dt[ fishIndex     == fish  & sampleTime[timeIndex + 1] == time, 
             location >  crossLine]
         ){dt[ fishIndex == fish & time == sampleTime[timeIndex + 1], 
               Cross := 1] # record if the fish crossed the line
          } 
        }
}

My ideal solution would looking something like this:

moveCheck <- Vectorize(function(...))
dt[ , Cross := moveCheck(location, fish)] 

fish is inside the function to make sure I do not accidental record movement when transitioning between fish.

So, here my question: what would be a method using the data.table syntax to improving the performance of this code and remove loops?

Upvotes: 0

Views: 204

Answers (1)

eddi
eddi

Reputation: 49448

Does this work for you (it does for OP example, but I'm not sure how representative that is)?

dt[, cross := c(0, diff(location >= crossLine) > 0), by = fish]

Upvotes: 4

Related Questions