itcplpl
itcplpl

Reputation: 780

plotting time series in R

I am working with data, 1st two columns are dates, 3rd column is symbol, and 4th and 5th columns are prices. So, I created a subset of the data as follows:

test.sub<-subset(test,V3=="GOOG",select=c(V1,V4)

and then I try to plot a time series chart using the following

as.ts(test.sub)
plot(test.sub)

well, it gives me a scatter plot - not what I was looking for. so, I tried plot(test.sub[1],test.sub[2]) and now I get the following error:

Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ

To make sure the no. of rows were same, I ran nrow(test.sub[1]) and nrow(test.sub[2]) and they both return equal rows, so as a newcomer to R, I am not sure what the fix is.

I also ran plot.ts(test.sub) and that works, but it doesn't show me the dates in the x-axis, which it was doing with plot(test.sub) and which is what I would like to see.

test.sub[1]
              V1
1107 2011-Aug-24
1206 2011-Aug-25
1307 2011-Aug-26
1408 2011-Aug-29
1510 2011-Aug-30
1613 2011-Aug-31
1718 2011-Sep-01
1823 2011-Sep-02
1929 2011-Sep-06
2035 2011-Sep-07
2143 2011-Sep-08
2251 2011-Sep-09
2359 2011-Sep-13
2470 2011-Sep-14
2581 2011-Sep-15
2692 2011-Sep-16
2785 2011-Sep-19
2869 2011-Sep-20
2965 2011-Sep-21
3062 2011-Sep-22
3160 2011-Sep-23
3258 2011-Sep-26
3356 2011-Sep-27
3455 2011-Sep-28
3555 2011-Sep-29
3655 2011-Sep-30
3755 2011-Oct-03
3856 2011-Oct-04
3957 2011-Oct-05
4059 2011-Oct-06
4164 2011-Oct-07
4269 2011-Oct-10
4374 2011-Oct-11
4479 2011-Oct-12
4584 2011-Oct-13
4689 2011-Oct-14

str(test.sub)
'data.frame':   35 obs. of  2 variables:
 $ V1:Class 'Date'  num [1:35] NA NA NA NA NA NA NA NA NA NA ...
 $ V4: num  0.475 0.452 0.423 0.418 0.403 ...

head(test.sub) V1 V4 
1212 <NA> 0.474697 
1313 <NA> 0.451907 
1414 <NA> 0.423184 
1516 <NA> 0.417709 
1620 <NA> 0.402966 
1725 <NA> 0.414264 

Now that this is working, I'd like to add a 3rd variable to plot a 3d chart - any suggestions how I can do that. thx!

Upvotes: 2

Views: 12099

Answers (3)

Gavin Simpson
Gavin Simpson

Reputation: 174968

The reason that you get the Error about different x and y lengths is immediately apparent if you do a traceback immediately upon raising the error:

> plot(test.sub[1],test.sub[2])
Error in xy.coords(x, y, xlabel, ylabel, log) : 
  'x' and 'y' lengths differ
> traceback()
6: stop("'x' and 'y' lengths differ")
5: xy.coords(x, y, xlabel, ylabel, log)
4: plot.default(x1, ...)
3: plot(x1, ...)
2: plot.data.frame(test.sub[1], test.sub[2])
1: plot(test.sub[1], test.sub[2])

The problems in your call are manifold. First, as mentioned by @mweylandt test.sub[1] is a data frame with the single component, not a vector comprised of the contents of the first component of test.sub.

From the traceback, we see that the plot.data.frame method was called. R is quite happy to plot a data frame as long as it has at least two columns. R took you at your word and passed test.sub[1] (as a data.frame) on to plot() - test.sub[2] never gets a look in. test.sub[1] is eventually passed on to xy.coords() which correctly informs you that you have lots of rows for x but 0 rows for y because test.sub[1] only contains a single component.

It would have worked if you'd done plot(test.sub[,1], test.sub[,2], type = "l") or used the formula interface to name the variables plot(V4 ~ V1, data = test.sub, type = "l") as I show in my other Answer.

Upvotes: 2

Gavin Simpson
Gavin Simpson

Reputation: 174968

Surely it is easier to use the formula interface:

> test <- data.frame(End = Sys.Date()+1:5, 
+                Start = Sys.Date()+0:4, 
+                tck = rep("GOOG",5), 
+                EndP= 1:5, 
+                StartP= 0:4)
> 
> test.sub = subset(test, tck=="GOOG",select = c(End, EndP))
> head(test.sub)
         End EndP
1 2011-10-19    1
2 2011-10-20    2
3 2011-10-21    3
4 2011-10-22    4
5 2011-10-23    5
> plot(EndP ~ End, data = test.sub, type = "l")

I work extensively with time series type data and rarely, if ever, have any need for the "ts" class of objects. Packages zoo and xts are very useful, but if all you want to do is plot the data, i) get the date/time information correctly formatted/set-up as a "Date" or "POSIXt" class object, and then ii) just plot it using standard graphics and type = "l" (or type = "b" or type = "o" if you want to see the observation times).

Upvotes: 1

mweylandt
mweylandt

Reputation: 274

So I think there are a few things going on here that are worth talking through:

first, some example data:

test <- data.frame(End = Sys.Date()+1:5, 
               Start = Sys.Date()+0:4, 
               tck = rep("GOOG",5), 
               EndP= 1:5, 
               StartP= 0:4)

test.sub = subset(test, tck=="GOOG",select = c(End, EndP))

First, note that test and test.sub are both data frames, so calls like test.sub[1] don't really "mean" anything to R.** It's more R-ish to write test.sub[,1] by virtue of consistency with other R structures. If you compare the results of str(test.sub[1]) and str(test.sub[,1]) you'll see that R treats them slightly differently.

You said you typed:

as.ts(test.sub)
plot(test.sub)

I'd guess you have extensive experience with some sort of OO-language; and while R does have some OO flavor to it, it doesn't apply here. Rather than transforming test.sub to something of class ts, this just does the transformation and throws it away, then moves on to plot the data frame you started with. It's an easy fix though:

test.sub.ts <- as.ts(test.sub)
plot(test.sub.ts)

But, this probably isn't what you were looking for either. Rather, R creates a time series that has two variables called "End" (which is the date now coerced to an integer) and "EndP". Funny business like this is part of the reason time series packages like zoo and xts have caught on so I'll detail them instead a little further down.

(Unfortunately, to the best of my understanding, R doesn't keep date stamps with its default ts class, choosing instead to keep start and end dates as well as a frequency. For more general time series work, this is rarely flexible enough)

You could perhaps get what you wanted by typing

plot(test.sub[,1], test.sub[,2]) 

instead of

plot(test.sub[1], test.sub[2])

since the former runs into trouble given that you are passing two sub-data frames instead of two vectors (even though it looks like you would be).*

Anyways, with xts (and similarly for zoo):

library(xts) # You may need to install this
xtemp <- xts(test.sub[,2], test.sub[,1]) # Create the xts object
plot(xtemp) 
# Dispatches a xts plot method which does all sorts of nice time series things

Hope some of this helps and sorry for the inline code that's not identified as such: still getting used to stack overflow.

Michael

**In reality, they access the lists that are used to structure a data frame internally, but that's more a code nuance than something worth relying on.

***The nitty-gritty is that when you pass plot(test.sub[1], test.sub[2]) to R, it dispatches the method plot.data.frame which takes a single data frame and tries to interpret the second data frame as an additional plot parameter which gets misinterpreted somewhere way down the line, giving your error.

Upvotes: 14

Related Questions