Reputation: 181
I am working on gps data right now, the position of the animal has been collected if possible every 4 hours. The data looks like this (XY data is not shown here for some reasons):
ID TIME POSIXTIME date_only
1 1 12:00 2005-05-08 12:00:00 2005-05-08
2 2 16:01 2005-05-08 16:01:00 2005-05-08
3 3 20:01 2005-05-08 20:01:00 2005-05-08
4 4 0:01 2005-05-09 00:01:00 2005-05-09
5 5 8:01 2005-05-09 08:01:00 2005-05-09
6 6 12:01 2005-05-09 12:01:00 2005-05-09
7 7 16:02 2005-05-09 16:02:00 2005-05-09
8 8 20:02 2005-05-09 20:02:00 2005-05-09
9 9 0:01 2005-05-10 00:01:00 2005-05-10
10 10 4:00 2005-05-10 04:00:00 2005-05-10
I would now like to take only the first locations per day. In most cases, this will be at 0:01 o'clock. However, sometimes it will be 4:01 or even later as there is missing data. How can I get only the first locations per day? They should be included in a new dataframe. I tried it with :
tapply(as.numeric(Kandularaw$TIME),list(Kandularaw$date_only),min, na.rm=T)
However, this did not work as R takes strange values when TIME is set as numeric. Is it possible do do it with an ifelse statement? If yes, how would it look like roughly? I am grateful for every help I can get. Thank you for your efforts.
Cheers,
Jan
Upvotes: 1
Views: 2174
Reputation: 174813
I would approach this from a simpler point of view. First, ensure that POSIXTIME
is one of the "POSIX"
classes. Then order the data by POSIXTIME
. At this point we can use any of the split-apply-combine idioms to do what you want, making use of the head()
function. Here I use aggregate()
:
Using this example data set:
dat <- structure(list(ID = 1:10, TIME = structure(c(4L, 6L, 8L, 1L,
3L, 5L, 7L, 9L, 1L, 2L), .Label = c("00:01:00", "04:00:00", "08:01:00",
"12:00:00", "12:01:00", "16:01:00", "16:02:00", "20:01:00", "20:02:00"
), class = "factor"), POSIXTIME = structure(1:10, .Label = c("2005/05/08 12:00:00",
"2005/05/08 16:01:00", "2005/05/08 20:01:00", "2005/05/09 00:01:00",
"2005/05/09 08:01:00", "2005/05/09 12:01:00", "2005/05/09 16:02:00",
"2005/05/09 20:02:00", "2005/05/10 00:01:00", "2005/05/10 04:00:00"
), class = "factor"), date_only = structure(c(1L, 1L, 1L, 2L,
2L, 2L, 2L, 2L, 3L, 3L), .Label = c("2005/05/08", "2005/05/09",
"2005/05/10"), class = "factor")), .Names = c("ID", "TIME", "POSIXTIME",
"date_only"), class = "data.frame", row.names = c(NA, 10L))
First, get POSIXTIME
and date_only
in the correct formats:
dat <- transform(dat,
POSIXTIME = as.POSIXct(POSIXTIME, format = "%Y/%m/%d %H:%M:%S"),
date_only = as.Date(date_only, format = "%Y/%m/%d"))
Next, order by POSIXTIME
:
dato <- with(dat, dat[order(POSIXTIME), ])
The final step is to use aggregate()
to split the data by date_only
and use head()
to select the first row:
aggregate(dato[,1:3], by = list(date = dato$`date_only`), FUN = head, n = 1)
notice I pass the n
argument of head()
the value 1
, indicating that it should extract only the first row of each days observations. Because we sorted by datetime and split on date, the first row should be the first observation per day. Do be aware of rounding issues however.
The final step results in:
> aggregate(dato[,1:3], by = list(date = dato$`date_only`), FUN = head, n = 1)
date ID TIME POSIXTIME
1 2005-05-08 1 12:00:00 2005-05-08 12:00:00
2 2005-05-09 4 00:01:00 2005-05-09 00:01:00
3 2005-05-10 9 00:01:00 2005-05-10 00:01:00
Instead of dato[,1:3]
refer to whatever columns in your original data set contain the variables (locations?) you wanted.
Upvotes: 1
Reputation: 263342
I am guessing you really want a row number as an index into a position record. If you know that these rows are ordered by date-time, and you are getting satisfactory group splits with that second argument to tapply
(however it was created), then try this:
idx <- tapply(1:NROW(Kandularaw), Kandularaw$date_only, "[", 1)
If you want records (rows) in that same dataframe then just use:
Kandularaw[ idx, ]
Upvotes: 1