Reputation: 651
The following is my data
> x
day sum
1 2015-04-14 129
2 2015-04-15 129
3 2015-04-16 129
4 2015-04-17 899
5 2015-04-18 899
6 2015-04-19 899
7 2015-04-20 899
8 2015-04-21 899
9 2015-04-22 899
10 2015-04-23 899
11 2015-04-24 899
12 2015-04-25 899
13 2015-04-26 899
14 2015-04-27 899
15 2015-04-28 899
16 2015-04-29 899
17 2015-04-30 899
18 2015-05-01 899
19 2015-05-02 899
20 2015-05-03 899
21 2015-05-04 899
22 2015-05-05 899
23 2015-05-06 899
24 2015-05-07 899
25 2015-05-08 899
26 2015-05-09 899
27 2015-05-10 899
28 2015-05-11 899
29 2015-05-12 920
30 2015-05-13 920
31 2015-05-14 920
32 2015-05-15 920
33 2015-05-16 920
34 2015-05-17 920
35 2015-05-18 920
36 2015-05-19 920
37 2015-05-20 920
38 2015-05-21 920
39 2015-05-22 920
40 2015-05-23 920
41 2015-05-24 920
42 2015-05-25 920
43 2015-05-26 920
44 2015-05-27 920
45 2015-05-28 920
46 2015-05-29 920
47 2015-05-30 920
48 2015-05-31 920
49 2015-06-01 1076
I wanted to do a regression analysis to find out the data at which the sum would become 6000. I did the following,
> q<-lm(day ~ sum, data=x)
> q
Call:
lm(formula = day ~ sum, data = x)
Coefficients:
(Intercept) sum
1.653e+04 3.584e-02
> as.Date(predict(q, data.frame(sum=6000)))
1
"2015-11-08"
I am able to predict the dates. But I am not sure of the accuracy and want to improve it. But I am not able to view the summary of the regression. I get the following error,
> summary(q)
Error in Ops.difftime((f - mean(f)), 2) :
'^' not defined for "difftime" objects
Just in case, the variable types matter,
> typeof(x)
[1] "list"
> typeof(x$day)
[1] "double"
> typeof(x$sum)
[1] "integer"
> class(x$day)
[1] "Date"
When I had a look at a previous forum,
Difftime Error using Looping Regressions in R
The following was given as solution,
I solved this by changing the index from time values to a standard integer index, and everything ran fine.
But I am not sure how to do this?
Can anybody help me with this and say what I need to do to get the summary here?
Thanks
Upvotes: 4
Views: 6931
Reputation: 93791
Dates in R
are actually just numeric values recast into a date format according to certain rules. For example, Date
format is the number of days elapsed since Jan 1, 1970 and POSIXct
format is the number of seconds elapsed since Jan 1, 1970 referenced to the UTC time zone. Here are a few examples:
as.numeric(as.Date("1970-01-01", tz="UTC"))
# 0
as.numeric(as.Date("1970-01-05", tz="UTC"))
# 4
as.numeric(as.POSIXct("1970-01-01 00:00:00", tz="UTC"))
# 0
as.numeric(as.POSIXct("1970-01-05 00:00:00", tz="UTC"))
# 345600
One way to deal with your problem is to convert the dates to a numeric format, run the regression on the numeric data, and then convert back to date at the end.
In the code below, I've assumed x$day
starts out in POSIXct
format.
# Convert POSIXct date to a numeric value equal to number of days since Jan 1, 1970
x$day.numeric = as.numeric(x$day)/(24*60*60)
# Regression model
q <- lm(day.numeric ~ sum, data=x)
summary(q)
Call:
lm(formula = day.numeric ~ sum, data = x)
Residuals:
Min 1Q Median 3Q Max
-22.253 -10.253 1.747 9.994 20.994
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 1.653e+04 8.448e+00 1956.996 < 2e-16 ***
sum 3.584e-02 9.550e-03 3.753 0.00048 ***
---
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1
Residual standard error: 12.67 on 47 degrees of freedom
Multiple R-squared: 0.2306, Adjusted R-squared: 0.2142
F-statistic: 14.09 on 1 and 47 DF, p-value: 0.0004796
# Predict date at which sum = 6000. Use as.Date to convert numeric value
# back to a date based on the number of days since Jan 1, 1970.
as.Date(predict(q, data.frame(sum=6000)), origin="1970-01-01")
1
"2015-11-08"
Upvotes: 8