papelr
papelr

Reputation: 438

Creating a Time Series from an Existing Data Set

I want to convert the following data into a time series - so I can use autoplot().

How do I do this so that the "Year" column is what would be on the x-axis? (I know the format for date has to be 01-01-2006, I'm ok with that):

Team  PTS    W   GF   GA     S    SA   Year
NSH    88   38  214  233  2382  2365   2014
NSH   104   47  226  202  2614  2304   2015
NSH    96   41  224  213  2507  2231   2016
NSH    94   41  238  220  2557  2458   2017
NSH   117   53  261  204  2641  2650   2018

Using as.ts() yields the Year column in some very large & unusable numbers. Thanks! I want to use the new time series frame for forecasting: ARIMA, VARs, etc.

Upvotes: 1

Views: 825

Answers (2)

markus
markus

Reputation: 26333

Does this give you what you want:

df_ts <- ts(df[ , setdiff(names(df), c("Team", "Year"))],
            start = 2014,
            end = 2018,
            frequency = 1)
class(df_ts)
#[1] "mts"    "ts"     "matrix"

I excluded the columns Team and Year from the coercion because the Year seems uneeded and Team is of type character. From ?ts

Time series must have at least one observation, and although they need not be numeric there is very limited support for non-numeric series.

Use ggfortify::autoplot.ts for plotting

library(ggfortify)
autoplot(df_ts)

enter image description here

data

df <- structure(list(Team = c("NSH", "NSH", "NSH", "NSH", "NSH"), PTS = c(88L, 
104L, 96L, 94L, 117L), W = c(38L, 47L, 41L, 41L, 53L), GF = c(214L, 
226L, 224L, 238L, 261L), GA = c(233L, 202L, 213L, 220L, 204L), 
    S = c(2382L, 2614L, 2507L, 2557L, 2641L), SA = c(2365L, 2304L, 
    2231L, 2458L, 2650L), Year = 2014:2018), .Names = c("Team", 
"PTS", "W", "GF", "GA", "S", "SA", "Year"), class = "data.frame", row.names = c(NA, 
-5L))

edit

One way to show missing observation in your plot would be to turn implicit missing observations into explicit missing observations. I will use tidyr's complete()

library(tidyr)
df_complete <- complete(df_incomplete, Year = min(Year):max(Year))
df_complete_ts <- ts(df_complete[ , setdiff(names(df_complete), c("Team", "Year"))],
                     start = 2011,
                     frequency = 1)
autoplot(df_complete_ts)

enter image description here

data2

df_incomplete <- structure(list(Team = c("NSH", "NSH", "NSH", "NSH", "NSH", "NSH", 
"NSH"), PTS = c(88L, 88L, 88L, 104L, 96L, 94L, 117L), W = c(38L, 
38L, 38L, 47L, 41L, 41L, 53L), GF = c(214L, 214L, 214L, 226L, 
224L, 238L, 261L), GA = c(233L, 233L, 233L, 202L, 213L, 220L, 
204L), S = c(2382L, 2382L, 2382L, 2614L, 2507L, 2557L, 2641L), 
    SA = c(2365L, 2365L, 2365L, 2304L, 2231L, 2458L, 2650L), 
    Year = c(2011L, 2012L, 2014L, 2015L, 2016L, 2017L, 2018L)), .Names = c("Team", 
"PTS", "W", "GF", "GA", "S", "SA", "Year"), class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 1

M. Wickers
M. Wickers

Reputation: 21

I have had success using ts() function in R. The code would look something like this for yearly data.

df <- ts(data, frequency = 1, start = 2014)
autoplot(df) 

This should give you the results you want.

Upvotes: 2

Related Questions