D Ward
D Ward

Reputation: 13

adding data from an irregular time series to a timeseries with 5-min timesteps

I have two time series which I would like to merge: activity is (almost) regular with 5 minute timesteps, the other, temperature, is an irregular time series, with timestamps which do not at any point match the timestamps in activity.

I would like add the column "temperature$temp" to the dataset activity, with NAs for times at which there are no temp records, and the actual temp records assigned to the closest timestamp in activity. (Alternatively to the closest prior or following timestamp).

Previously I used the approxfun function to interpolate the temperature data to match the activity timeseries, however this is not ideal and I would like to include only temperatures which were actually recorded.

I have so far been unable to modify the solutions of similar-seeming timeseries questions posted on stack overflow and elsewhere because they either assume that the timeseries will match at some times, or they aim for output which merges the time series so that the timestamps of both data sets are included, neither of which is the case here.

   activity <- structure(list(Date = structure(c(1350542219, 1350542519, 1350542819, 
   1350543119, 1350543419, 1350543719, 1350544019, 1350544319, 1350544619, 
   1350544919, 1350545219, 1350545519, 1350545819, 1350546119, 1350546419, 
   1350546719, 1350547019, 1350547319, 1350547619), class = c("POSIXct", 
   "POSIXt"), tzone = "GMT"), Activity = c(300, 300, 300, 300, 300, 
   300, 300, 207, 0, 0, 0, 0, 153, 300, 300, 300, 300, 300, 300)), .Names = c("Date", 
   "Activity"), row.names = 1220:1238, class = "data.frame")

   temperature <- structure(list(Date = structure(c(1350543180, 1350547140), class =       c("POSIXct", 
  "POSIXt"), tzone = "GMT"), temp = c(12.625, 12.5)), .Names = c("Date", 
  "temp"), row.names = 2:3, class = "data.frame")

output <- structure(list(Date = structure(c(1350542219, 1350542519, 1350542819, 
1350543119, 1350543419, 1350543719, 1350544019, 1350544319, 1350544619, 
1350544919, 1350545219, 1350545519, 1350545819, 1350546119, 1350546419, 
1350546719, 1350547019, 1350547319, 1350547619), class = c("POSIXct", 
"POSIXt"), tzone = "GMT"), Activity = c(300, 300, 300, 300, 300, 
300, 300, 207, 0, 0, 0, 0, 153, 300, 300, 300, 300, 300, 300), 
temp = c(NA, NA, NA, 12.625, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, 12.5, NA)), .Names = c("Date", "Activity", 
"temp"), row.names = 1220:1238, class = "data.frame")

I would greatly appreciate any help or advice you could give me.

Upvotes: 1

Views: 467

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270170

The roll= facility in data.table can do this:

library(data.table)
activity.dt <- data.table(activity, key="Date")[,Date2:=Date]
temperature.dt <- data.table(temperature)

activity.dt[temperature.dt, list(Date=Date2, temp), roll=-Inf][
  activity.dt, list(Date, Activity, temp)]

giving:

                   Date Activity   temp
 1: 2012-10-18 06:36:59      300     NA
 2: 2012-10-18 06:41:59      300     NA
 3: 2012-10-18 06:46:59      300     NA
 4: 2012-10-18 06:51:59      300     NA
 5: 2012-10-18 06:56:59      300 12.625
 6: 2012-10-18 07:01:59      300     NA
 7: 2012-10-18 07:06:59      300     NA
 8: 2012-10-18 07:11:59      207     NA
 9: 2012-10-18 07:16:59        0     NA
10: 2012-10-18 07:21:59        0     NA
11: 2012-10-18 07:26:59        0     NA
12: 2012-10-18 07:31:59        0     NA
13: 2012-10-18 07:36:59      153     NA
14: 2012-10-18 07:41:59      300     NA
15: 2012-10-18 07:46:59      300     NA
16: 2012-10-18 07:51:59      300     NA
17: 2012-10-18 07:56:59      300     NA
18: 2012-10-18 08:01:59      300 12.500
19: 2012-10-18 08:06:59      300     NA

Upvotes: 0

Andrie
Andrie

Reputation: 179558

First, define a small helper function that determines the nearest index position:

findNearest <- function(x, y)which.min(abs(x - y))

Then use sapply with findNearest to create an index vector of your temperatures. Use this index vector to create the desired temperature vector

idx <- sapply(activity$Date, findNearest, temperature$Date)
activity$temp <- temperature$temp[idx]

The results:

head(activity)
                    Date Activity   temp
1220 2012-10-18 06:36:59      300 12.625
1221 2012-10-18 06:41:59      300 12.625
1222 2012-10-18 06:46:59      300 12.625
1223 2012-10-18 06:51:59      300 12.625
1224 2012-10-18 06:56:59      300 12.625
1225 2012-10-18 07:01:59      300 12.625


tail(activity)
                    Date Activity temp
1233 2012-10-18 07:41:59      300 12.5
1234 2012-10-18 07:46:59      300 12.5
1235 2012-10-18 07:51:59      300 12.5
1236 2012-10-18 07:56:59      300 12.5
1237 2012-10-18 08:01:59      300 12.5
1238 2012-10-18 08:06:59      300 12.5

Upvotes: 1

Related Questions