Reputation: 143
I am looking for guidance on how to interpret the seasonal_trough_year feature that feat_stl generates for my time series. My understanding is that the output would be an integer that maps to the given instance of the seasonality. i.e. for monthly seasonality, 1 means January, 6 means June, 12 means December (mapping to month.abb).
For example: yearmonth("2020 Jan") |> month()
returns 1.
However, I am getting 0 for the seasonal_trough_year
of one of my time series.
I am pulling "Labor Force Participation Rate - 20 Yrs. & over" for men and women from FRED. Reproducible code:
library(fpp3)
library(purrr)
library(fredr)
#get api key from https://fred.stlouisfed.org/docs/api/api_key.html
fred_df_raw <- c("men >= age 20" = "LNU01300025",
"women >= age 20" = "LNU01300026") |>
map_dfr(fredr, .id = "series")
fred_df <- fred_df_raw |>
select(series, date, value) |>
rename(participation_rate = value) |>
mutate(date = yearmonth(date)) |>
as_tsibble(key = series, index = date)
fred_df |>
model(stl = STL(participation_rate ~ trend() + season())) |>
components() |>
autoplot()
fred_tile <- tile_tsibble(fred_df, .size = 4*12) |>
arrange(series, .id) |>
group_by(series, .id) |>
filter(n() == 4*12) |> #only keep complete tiles
ungroup()
fred_tile_features <- fred_tile |>
features(participation_rate, feature_set(pkgs = "feasts"))
fred_tile_features |>
distinct(seasonal_trough_year)
fred_tile_features |>
distinct(seasonal_peak_year)
fred_tile_features |>
select(series, .id, seasonal_peak_year, seasonal_trough_year) |>
filter(seasonal_trough_year == 0)
fred_tile |>
filter(.id == 18,
series == "men >= age 20") |>
autoplot()
#i would expect the trough to be December
fred_tile |>
filter(.id == 18,
series == "men >= age 20") |>
model(stl = STL(participation_rate ~ trend() + season())) |>
components() |>
ggplot(aes(date, season_year)) +
geom_line() +
geom_point() +
geom_point(aes(color = month(date, abbr = TRUE, label = TRUE) == "Dec"))
Definitions from the FPP3 book:
Upvotes: 1
Views: 22
Reputation: 2459
The current implementation of feat_stl()
defines the seasonal peak and trough as seasonal observations since time 1 (the first time point).
library(feasts)
#> Loading required package: fabletools
#> Registered S3 method overwritten by 'tsibble':
#> method from
#> as_tibble.grouped_df dplyr
uad <- as_tsibble(USAccDeaths)
uad |> autoplot(value)
# Peak in July, trough in February
library(dplyr)
#>
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#>
#> filter, lag
#> The following objects are masked from 'package:base':
#>
#> intersect, setdiff, setequal, union
uad |>
as_tibble() |>
group_by(month = lubridate::month(index, label = TRUE)) |>
summarise(avg = mean(value)) |>
arrange(avg)
#> # A tibble: 12 × 2
#> month avg
#> <ord> <dbl>
#> 1 Feb 7284.
#> 2 Jan 8044
#> 3 Mar 8062.
#> 4 Apr 8275.
#> 5 Nov 8467.
#> 6 Sep 8700.
#> 7 Dec 8721.
#> 8 Oct 8990.
#> 9 May 9124.
#> 10 Jun 9595.
#> 11 Aug 9749.
#> 12 Jul 10453.
# First observation at 1973 Jan, trough at '2' (February) and peak at '7' (July)
uad |> features(value, feat_stl)
#> # A tibble: 1 × 9
#> trend_strength seasonal_strength_year seasonal_peak_year seasonal_trough_year
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.802 0.945 7 2
#> # ℹ 5 more variables: spikiness <dbl>, linearity <dbl>, curvature <dbl>,
#> # stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
# First observation at 1973 Feb, trough at '1' (February) and peak at '6' (July)
uad |> tail(-1) |> features(value, feat_stl)
#> # A tibble: 1 × 9
#> trend_strength seasonal_strength_year seasonal_peak_year seasonal_trough_year
#> <dbl> <dbl> <dbl> <dbl>
#> 1 0.788 0.944 6 1
#> # ℹ 5 more variables: spikiness <dbl>, linearity <dbl>, curvature <dbl>,
#> # stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
Created on 2025-01-24 with reprex v2.1.1
This is because the current implementation is based on numeric seasonal periods (modular arithmetic) has no sense of a 'seasonal origin', so it simply uses the first time point in the data. This is similar to how some places of the world use calendars where the day starts on Monday, and others start on Sunday. Fundamentally, days of weeks could start on Wednesday and months of years could start on February - this makes no difference mathematically.
Ideally it would explicitly state the name of the season (e.g. trough: February and peak: July), however this currently isn't possible (at least not in full generality). I'm currently working on a time+calendar package mixtime which will improve this with support for calendar-based seasons/cycles (rather than the current numeric implementation of 12
for months).
Upvotes: 1