Reputation: 13
I have two time series which I would like to merge: activity
is (almost) regular with 5 minute timesteps, the other, temperature
, is an irregular time series, with timestamps which do not at any point match the timestamps in activity
.
I would like add the column "temperature$temp" to the dataset activity
, with NAs for times at which there are no temp records, and the actual temp records assigned to the closest timestamp in activity
. (Alternatively to the closest prior or following timestamp).
Previously I used the approxfun function to interpolate the temperature data to match the activity timeseries, however this is not ideal and I would like to include only temperatures which were actually recorded.
I have so far been unable to modify the solutions of similar-seeming timeseries questions posted on stack overflow and elsewhere because they either assume that the timeseries will match at some times, or they aim for output which merges the time series so that the timestamps of both data sets are included, neither of which is the case here.
activity <- structure(list(Date = structure(c(1350542219, 1350542519, 1350542819,
1350543119, 1350543419, 1350543719, 1350544019, 1350544319, 1350544619,
1350544919, 1350545219, 1350545519, 1350545819, 1350546119, 1350546419,
1350546719, 1350547019, 1350547319, 1350547619), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), Activity = c(300, 300, 300, 300, 300,
300, 300, 207, 0, 0, 0, 0, 153, 300, 300, 300, 300, 300, 300)), .Names = c("Date",
"Activity"), row.names = 1220:1238, class = "data.frame")
temperature <- structure(list(Date = structure(c(1350543180, 1350547140), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), temp = c(12.625, 12.5)), .Names = c("Date",
"temp"), row.names = 2:3, class = "data.frame")
output <- structure(list(Date = structure(c(1350542219, 1350542519, 1350542819,
1350543119, 1350543419, 1350543719, 1350544019, 1350544319, 1350544619,
1350544919, 1350545219, 1350545519, 1350545819, 1350546119, 1350546419,
1350546719, 1350547019, 1350547319, 1350547619), class = c("POSIXct",
"POSIXt"), tzone = "GMT"), Activity = c(300, 300, 300, 300, 300,
300, 300, 207, 0, 0, 0, 0, 153, 300, 300, 300, 300, 300, 300),
temp = c(NA, NA, NA, 12.625, NA, NA, NA, NA, NA, NA, NA,
NA, NA, NA, NA, NA, NA, 12.5, NA)), .Names = c("Date", "Activity",
"temp"), row.names = 1220:1238, class = "data.frame")
I would greatly appreciate any help or advice you could give me.
Upvotes: 1
Views: 467
Reputation: 270170
The roll=
facility in data.table can do this:
library(data.table)
activity.dt <- data.table(activity, key="Date")[,Date2:=Date]
temperature.dt <- data.table(temperature)
activity.dt[temperature.dt, list(Date=Date2, temp), roll=-Inf][
activity.dt, list(Date, Activity, temp)]
giving:
Date Activity temp
1: 2012-10-18 06:36:59 300 NA
2: 2012-10-18 06:41:59 300 NA
3: 2012-10-18 06:46:59 300 NA
4: 2012-10-18 06:51:59 300 NA
5: 2012-10-18 06:56:59 300 12.625
6: 2012-10-18 07:01:59 300 NA
7: 2012-10-18 07:06:59 300 NA
8: 2012-10-18 07:11:59 207 NA
9: 2012-10-18 07:16:59 0 NA
10: 2012-10-18 07:21:59 0 NA
11: 2012-10-18 07:26:59 0 NA
12: 2012-10-18 07:31:59 0 NA
13: 2012-10-18 07:36:59 153 NA
14: 2012-10-18 07:41:59 300 NA
15: 2012-10-18 07:46:59 300 NA
16: 2012-10-18 07:51:59 300 NA
17: 2012-10-18 07:56:59 300 NA
18: 2012-10-18 08:01:59 300 12.500
19: 2012-10-18 08:06:59 300 NA
Upvotes: 0
Reputation: 179558
First, define a small helper function that determines the nearest index position:
findNearest <- function(x, y)which.min(abs(x - y))
Then use sapply
with findNearest
to create an index vector of your temperatures. Use this index vector to create the desired temperature vector
idx <- sapply(activity$Date, findNearest, temperature$Date)
activity$temp <- temperature$temp[idx]
The results:
head(activity)
Date Activity temp
1220 2012-10-18 06:36:59 300 12.625
1221 2012-10-18 06:41:59 300 12.625
1222 2012-10-18 06:46:59 300 12.625
1223 2012-10-18 06:51:59 300 12.625
1224 2012-10-18 06:56:59 300 12.625
1225 2012-10-18 07:01:59 300 12.625
tail(activity)
Date Activity temp
1233 2012-10-18 07:41:59 300 12.5
1234 2012-10-18 07:46:59 300 12.5
1235 2012-10-18 07:51:59 300 12.5
1236 2012-10-18 07:56:59 300 12.5
1237 2012-10-18 08:01:59 300 12.5
1238 2012-10-18 08:06:59 300 12.5
Upvotes: 1