Reputation: 87
I have location data from GPS collars and I am trying to simulate different scenarios based on the functionality of the collars in R. One of those simulations is that the collars tend to miss taking GPS points throughout the day (for various reasons). My data consists of 14 GPS points per day, and I want to randomly select (without replacement) a minimum number of 5 points, the possibility of a max 14.
In another simulation, I extracted 5 random points per day using this script from another thread here (R: Random sampling an even number of observations from a range of categories), but I do not fully understand all the different bits of the script that would allow me to alter it to get it to extract AT LEAST 5 points. Any advice most appreciated.
dat2 <- data.table(dat.r)
dat2.ss <- dat2[ , .SD[sample(1:.N,min(5,.N))], by=DayNo]
Output from data-frame (dat.r)
dput(head(dat.r, 20))
structure(list(Latitude = c(5.4118432, 5.4118815, 5.4115713,
5.4111541, 5.4087853, 5.4083702, 5.4082527, 5.4078161, 5.4075528,
5.407321, 5.4070598, 5.4064237, 5.4070621, 5.4070251, 5.4070555,
5.4065127, 5.4065134, 5.4064872, 5.4056724, 5.4038751), Longitude = c(118.0225467,
118.0222841, 118.0211875, 118.0208637, 118.0205413, 118.0206064,
118.0204101, 118.0209272, 118.0213827, 118.0214189, 118.0217748,
118.0223343, 118.0227079, 118.0226511, 118.0226916, 118.0220733,
118.02218, 118.0221843, 118.0223316, 118.0198153), DayNo = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L,
2L, 2L, 2L)), .Names = c("Latitude", "Longitude", "DayNo"), row.names = c(NA,
20L), class = "data.frame")
Upvotes: 0
Views: 446
Reputation: 59415
This should work:
library(data.table)
set.seed(1) # for reproducible example
setDT(dat.r)[,.SD[sample(.N, sample(min(5,.N):min(.N,14),1))], by=DayNo]
# DayNo Latitude Longitude
# 1: 1 5.411881 118.0223
# 2: 1 5.411154 118.0209
# 3: 1 5.407553 118.0214
# 4: 1 5.411843 118.0225
# 5: 1 5.411571 118.0212
# 6: 1 5.407062 118.0227
# 7: 1 5.408785 118.0205
# 8: 1 5.408370 118.0206
# 9: 2 5.406513 118.0221
# 10: 2 5.407025 118.0227
# 11: 2 5.406513 118.0222
# 12: 2 5.405672 118.0223
# 13: 2 5.403875 118.0198
This idea is that sample(x, n)
takes a sample of size n
from the vector 1:x
(where x
is a number, not a vector). So you want n
to be itself sampled from 5:min(.N,14)
. I added the possibility that there are fewer than five points in a given day.
Upvotes: 2