Reputation: 197
I have a large dataset with multiple values for specific days. There are missing values in the dataset as it's for a long period of time. Here's a small example:
set.seed(1)
data <- data.frame(
Date = sample(c("1993-07-09", "1993-07-09", "1993-07-10", "1993-08-11", "1993-08-11", "1993-08-11")),
Oxygen = sample(c(0.2, 0.4, 0.4, 0.2, 0.4, 0.5))
)
data$Date <- as.Date(data$Date)
I want to convert this dataframe into a ts object, so that I can forecast, use arima models, and eventually find outliers.
It specifically needs to be a ts object and not a xts object.
The problem I'm facing is: 1) I don't know how to convert a data frame into a ts object. 2) Create a ts object that allows for multiple values to take place for a single day.
Any help would be greatly appreciated. Thank you!
Upvotes: 1
Views: 1331
Reputation: 270075
(1) mts ts
objects must be regularly spaced (i.e. the same amount of time between each successive point) and can't represent dates (but we can use numbers) so we assume that the August dates were meant to be July so that we have consecutive dates and we use the number of days since the Epoch (January 1, 1970) as the time.
Add a sequence number to distinguish equal dates and split the series into multiple columns:
library(zoo)
data3 <- transform(data2, seq = ave(1:nrow(data2), Date, FUN = seq_along))
z <- read.zoo(data3, index = "Date", split = "seq")
as.ts(z)
giving:
Time Series:
Start = 8590
End = 8592
Frequency = 1
1 2 3
8590 0.5 0.4 NA
8591 0.4 NA NA
8592 0.2 0.2 0.4
(2) mean Alternately average the values on equal dates:
z2 <- read.zoo(data2, index = "Date", aggregate = mean)
as.ts(z2)
giving:
Time Series:
Start = 8590
End = 8592
Frequency = 1
[1] 0.4500000 0.4000000 0.2666667
(3) Ignore Date We could ignore the Date column (as the poster suggested) in which case we just use 1, 2, 3, ... as the time index:
ts(data$Oxygen)
(4) 1st point each month Since, in a comment, the poster indicated that there is a lot of data (20 years) we could take the first point in each month forming a monthly series.
as.ts(read.zoo(data, index = "Date", FUN = as.yearmon, aggregate = function(x) x[1]))
August dates have been changed to July to form data2
above:
set.seed(1)
data2 <- data.frame(
Date = sample(c("1993-07-09", "1993-07-09", "1993-07-10",
"1993-07-11", "1993-07-11", "1993-07-11")),
Oxygen = sample(c(0.2, 0.4, 0.4, 0.2, 0.4, 0.5))
)
data2$Date <- as.Date(data$Date)
Upvotes: 2