Reputation: 67
I'm trying to differenciate a time serie, which looks like that : time serie to differenciate. But sadly, diff(spread)
returns me this. I also tried diff(spread,1))
. I nearly copypasted some code of a working example, and I don't find any obvious mistakes. I installed the modules two hours ago, so I've got the last version of all packages used.
# chemin espace de travail
setwd("C:/Users/Simon/Desktop/Projet serie temp")
#### Q1 ####
require(zoo)
require(tseries)
require(fUnitRoots)
data <- read.csv("base_form.csv",sep=",") #import .csv
View(data) #visualisation
indice = data$Index
dates = data$Dates
spread <- zoo(indice, order.by=dates)
View(spread)
plot.window(ylim = c(-20,20))
plot(spread) #représentation graphique
dspread <- diff(spread) #différence première
plot(cbind(spread,dspread))
Here is the error I get :
> plot(dspread)
Error in plot.window(xlim = xlim, ylim = ylim, log = log, yaxs = pars$yaxs) :
valeurs finies requises pour 'ylim'
In addition: Warning messages:
1: In min(x) : no non-missing arguments to min; returning Inf
2: In max(x) : no non-missing arguments to max; returning -Inf
3: In min(x) : no non-missing arguments to min; returning Inf
4: In max(x) : no non-missing arguments to max; returning -Inf
Here is the output of dput(head(spread))
structure(c(83.87, 86.15, 94.07, 90.02, 92.22, 93.18), index = structure(1:6, .Label = c("1990-01",
"1990-02", "1990-03", "1990-04", "1990-05", "1990-06", "1990-07",
"1990-08", "1990-09", "1990-10", "1990-11", "1990-12", "1991-01",
"1991-02", "1991-03", "1991-04", "1991-05", "1991-06", "1991-07",
"1991-08", "1991-09", "1991-10", "1991-11", "1991-12", "1992-01",
"1992-02", "1992-03", "1992-04", "1992-05", "1992-06", "1992-07",
"1992-08", "1992-09", "1992-10", "1992-11", "1992-12", "1993-01",
"1993-02", "1993-03", "1993-04", "1993-05", "1993-06", "1993-07",
"1993-08", "1993-09", "1993-10", "1993-11", "1993-12", "1994-01",
"1994-02", "1994-03", "1994-04", "1994-05", "1994-06", "1994-07",
"1994-08", "1994-09", "1994-10", "1994-11", "1994-12", "1995-01",
"1995-02", "1995-03", "1995-04", "1995-05", "1995-06", "1995-07",
"1995-08", "1995-09", "1995-10", "1995-11", "1995-12", "1996-01",
"1996-02", "1996-03", "1996-04", "1996-05", "1996-06", "1996-07",
"1996-08", "1996-09", "1996-10", "1996-11", "1996-12", "1997-01",
"1997-02", "1997-03", "1997-04", "1997-05", "1997-06", "1997-07",
"1997-08", "1997-09", "1997-10", "1997-11", "1997-12", "1998-01",
"1998-02", "1998-03", "1998-04", "1998-05", "1998-06", "1998-07",
"1998-08", "1998-09", "1998-10", "1998-11", "1998-12", "1999-01",
"1999-02", "1999-03", "1999-04", "1999-05", "1999-06", "1999-07",
"1999-08", "1999-09", "1999-10", "1999-11", "1999-12", "2000-01",
"2000-02", "2000-03", "2000-04", "2000-05", "2000-06", "2000-07",
"2000-08", "2000-09", "2000-10", "2000-11", "2000-12", "2001-01",
"2001-02", "2001-03", "2001-04", "2001-05", "2001-06", "2001-07",
"2001-08", "2001-09", "2001-10", "2001-11", "2001-12", "2002-01",
"2002-02", "2002-03", "2002-04", "2002-05", "2002-06", "2002-07",
"2002-08", "2002-09", "2002-10", "2002-11", "2002-12", "2003-01",
"2003-02", "2003-03", "2003-04", "2003-05", "2003-06", "2003-07",
"2003-08", "2003-09", "2003-10", "2003-11", "2003-12", "2004-01",
"2004-02", "2004-03", "2004-04", "2004-05", "2004-06", "2004-07",
"2004-08", "2004-09", "2004-10", "2004-11", "2004-12", "2005-01",
"2005-02", "2005-03", "2005-04", "2005-05", "2005-06", "2005-07",
"2005-08", "2005-09", "2005-10", "2005-11", "2005-12", "2006-01",
"2006-02", "2006-03", "2006-04", "2006-05", "2006-06", "2006-07",
"2006-08", "2006-09", "2006-10", "2006-11", "2006-12", "2007-01",
"2007-02", "2007-03", "2007-04", "2007-05", "2007-06", "2007-07",
"2007-08", "2007-09", "2007-10", "2007-11", "2007-12", "2008-01",
"2008-02", "2008-03", "2008-04", "2008-05", "2008-06", "2008-07",
"2008-08", "2008-09", "2008-10", "2008-11", "2008-12", "2009-01",
"2009-02", "2009-03", "2009-04", "2009-05", "2009-06", "2009-07",
"2009-08", "2009-09", "2009-10", "2009-11", "2009-12", "2010-01",
"2010-02", "2010-03", "2010-04", "2010-05", "2010-06", "2010-07",
"2010-08", "2010-09", "2010-10", "2010-11", "2010-12", "2011-01",
"2011-02", "2011-03", "2011-04", "2011-05", "2011-06", "2011-07",
"2011-08", "2011-09", "2011-10", "2011-11", "2011-12", "2012-01",
"2012-02", "2012-03", "2012-04", "2012-05", "2012-06", "2012-07",
"2012-08", "2012-09", "2012-10", "2012-11", "2012-12", "2013-01",
"2013-02", "2013-03", "2013-04", "2013-05", "2013-06", "2013-07",
"2013-08", "2013-09", "2013-10", "2013-11", "2013-12", "2014-01",
"2014-02", "2014-03", "2014-04", "2014-05", "2014-06", "2014-07",
"2014-08", "2014-09", "2014-10", "2014-11", "2014-12", "2015-01",
"2015-02", "2015-03", "2015-04", "2015-05", "2015-06", "2015-07",
"2015-08", "2015-09", "2015-10", "2015-11", "2015-12", "2016-01",
"2016-02", "2016-03", "2016-04", "2016-05", "2016-06", "2016-07",
"2016-08", "2016-09", "2016-10", "2016-11", "2016-12", "2017-01",
"2017-02", "2017-03", "2017-04", "2017-05", "2017-06", "2017-07",
"2017-08", "2017-09", "2017-10", "2017-11", "2017-12", "2018-01",
"2018-02"), class = "factor"), class = "zoo")
Upvotes: 0
Views: 572
Reputation: 160447
I cannot reproduce the problem perfectly, but I have some thoughts.
TL;DR: Edit: don't use factor
s, use either character
or Date
objects before zoo
-ifying things.
I hunted this down by looking at the source for zoo:::diff.zoo
. Namely, it was failing at
x - lag(x, k=-1)
# Data:
# numeric(0)
# Index:
# factor(0)
# 338 Levels: 1990-01 1990-02 1990-03 1990-04 1990-05 1990-06 1990-07 1990-08 1990-09 1990-10 1990-11 1990-12 1991-01 1991-02 1991-03 1991-04 1991-05 1991-06 1991-07 1991-08 1991-09 1991-10 1991-11 1991-12 1992-01 1992-02 1992-03 1992-04 ... 2018-02
I believe that typically zoo
objects are indexed based on some form of time-progression. This might be simple integers, as in
str(zoo(2:5))
# 'zoo' series from 1 to 4
# Data: int [1:4] 2 3 4 5
# Index: int [1:4] 1 2 3 4
or something more explicit/intentional, such as a Date
or POSIXct
timestamp. In your case, it's a factor
. I don't know if zoo
is trying to treat it like an integer (probably not, otherwise it should have come up with something), or like some categorical character
, most likely not what you want in a time-series. (Correction: as 42- pointed out, this is actually quite fine.)
So even if zoo
intelligently deals with factors, there is also the problem that the date you have listed is not perfectly unambiguous (is not a time-based object). For instance, by "1990-01"
do you mean "1990-01-01"
? Though it might seem intuitive and obvious to make that assumption, R typically does not follow you on that leap.
Try this:
(ind <- index(x))
# [1] 1990-01 1990-02 1990-03 1990-04 1990-05 1990-06
# 338 Levels: 1990-01 1990-02 1990-03 1990-04 1990-05 1990-06 1990-07 1990-08 1990-09 1990-10 1990-11 1990-12 ... 2018-02
(ind <- as.Date(paste0(ind, "-01"), format="%Y-%m-%d"))
# [1] "1990-01-01" "1990-02-01" "1990-03-01" "1990-04-01" "1990-05-01" "1990-06-01"
index(x) <- ind
(The surrounding parentheses are merely a shortcut to dump the output post-assignment. They can be safely removed for production.) That now allows
x - lag(x, k=-1)
# 1990-01-01 1990-02-01 1990-03-01 1990-04-01 1990-05-01 1990-06-01
# NA 2.28 7.92 -4.05 2.20 0.96
which means your spread
is likely working now:
diff(x)
# 1990-02-01 1990-03-01 1990-04-01 1990-05-01 1990-06-01
# 2.28 7.92 -4.05 2.20 0.96
My guess means that your data import should instead look like:
data <- read.csv("base_form.csv",sep=",") #import .csv
indice = data$Index
dates = as.Date(paste0(data$Dates, "-01"), format="%Y-%m-%d")
spread <- zoo(indice, order.by=dates)
or more simply
data <- read.csv("base_form.csv",sep=",")
dates = as.character(data$Dates)
or even more simply
data <- read.csv("base_form.csv",sep=",", stringsAsFactors=FALSE)
Upvotes: 2
Reputation: 263362
I'm posting to correct what I think are some inaccuracies in r2evans analysis of the problem. It is true that the problem stems from using a factor as an index. The factor class in R does not support ordering operations and at least one of the "o"'s in the name "zoo" stands for "ordered". It could have been solved quickly by:
index(spread) <- as.character(index(spread))
Then the diff
-operation would have succeeded, and the cbind
operation would also have succeeded because there is a cbind.zoo
function that recognizes differences in number of rows and automagically pads the shorter columns with NA's at the beginning.
> cbind( diff(spread), spread )
diff(spread) spread
1990-01 NA 83.87
1990-02 2.28 86.15
1990-03 7.92 94.07
1990-04 -4.05 90.02
1990-05 2.20 92.22
1990-06 0.96 93.18
> cbind( diff(diff(spread)), spread )
diff(diff(spread)) spread
1990-01 NA 83.87
1990-02 NA 86.15
1990-03 5.64 94.07
1990-04 -11.97 90.02
1990-05 6.25 92.22
1990-06 -1.24 93.18
Character vectors are perfectly acceptable index classes for zoo. They will be ordered as lexical values. It's perfectly acceptable to make a "<" or ">" operation on two character values, so there is no ambiguity in this case. The zoo-package also has a yearmon
class that this index could become if desired.
Upvotes: 1
Reputation: 2263
The problem appears to be the dates are encoded as factors. Note the difference if we construct spread
manually:
> indice <- c(83.87, 86.15, 94.07, 90.02, 92.22, 93.18)
> dates <- as.factor(c("1990-01", "1990-02", "1990-03", "1990-04", "1990-05", "1990-06"))
> spread <- zoo(indice, order.by = dates)
> diff(spread)
Data:
numeric(0)
Index:
factor(0)
Levels: 1990-01 1990-02 1990-03 1990-04 1990-05 1990-06
> dates <- c("1990-01", "1990-02", "1990-03", "1990-04", "1990-05", "1990-06")
> spread <- zoo(indice, order.by = dates)
> diff(spread)
1990-02 1990-03 1990-04 1990-05 1990-06
2.28 7.92 -4.05 2.20 0.96
To fix it, you can try adding stringsAsFactors = FALSE
to your read.csv
.
data <- read.csv("base_form.csv", stringsAsFactors = FALSE)
(Note that sep = ","
is the default for read.csv
, so you don't really need to specify it.)
EDIT: I should add there are many more zoo
-like way of reading dates in correctly, see https://cran.r-project.org/web/packages/zoo/vignettes/zoo-read.pdf
Upvotes: 1