deleting rows in a dataframe based on surrounding rows

Question

I have a dataframe with 6 columns and many rows that includes positions for an individual tagged fish. The structure is as follows:

head(tag.29912)

 Date.and.Time..UTC.    Receiver    Transmitter Latitude Longitude ndiffs29912
1    07/10/2010 15:53 VR2W-107619 A69-1303-29912 48.56225 -53.89144          NA
2    07/10/2010 15:56 VR2W-107619 A69-1303-29912 48.56225 -53.89144         180
3    07/10/2010 16:00 VR2W-107619 A69-1303-29912 48.56225 -53.89144         240
4    07/10/2010 16:24 VR2W-107619 A69-1303-29912 48.56225 -53.89144        1440
5    07/10/2010 16:45 VR2W-104556 A69-1303-29912 48.56460 -53.88956        1260
6    07/10/2010 16:47 VR2W-107619 A69-1303-29912 48.56225 -53.89144         120

The ndiffs29912 refers to the difference in time between detections - hence the first row has an NA because there is nothing previous to calculate a time difference with.

I would like to filter out any single detections that occur over 24 hours (86400sec), because these are likely spurious. I have tried the following code to try and remove them:

for (i in 1:length(tag.29912)) { 
if (tag.29912[i,6]>=86400 & tag.29912[i+1,6]>=86400) 
{rm(i)}

This has not worked. I have also tried:

for (i in 1:length(tag.29912)) { 
if (tag.29912[i,6]>=86400 & tag.29912[i+1,6]>=86400) 
{new<-tag.29912[i,]}
else{filteredtag.29912<-as.data.frame(tag.29912[-new])}
}

to no avail. Ultimately, I would like a new dataframe with all single detections removed. Any tips would be GREATLY appreciated!!

joran · Accepted Answer

A couple of things:

A data frame is a list with some special requirements (i.e. each element of the list must be of the same length). One consequence of this is that length(tag.29912) should return the length of the list, i.e. the number of columns, whereas in your loop you probably intended to loop over the number of rows.
You can pull out all these rows using vectorization, which is very very important to learn in R.
rm() removes objects from your workspace, which is not what you're trying to do.

In your particular case you want to identify rows with values in the ndiffs29912 column with consecutive 86400 values and remove them.

So something like

tag.29912$flag <- FALSE
for (i in 2:(nrow(tag.29912) - 1){
    if (tag.29912[i,6]>=86400 & tag.29912[i+1,6]>=86400){
      tag.29912$flag[i] <- tag.29912$flag[i+1] <- TRUE
    }
}
tag.29912 <- tag.29912[!tag.29912$flag,]

should give you what you want.

But by the looks of this code, though, I strongly recommend that you take a few hours and carefully spend some time with a basic manual for beginners.

deleting rows in a dataframe based on surrounding rows

Answers (1)

Related Questions