Conor
Conor

Reputation: 143

How to interpret seasonal_trough in feasts:feat_stl

I am looking for guidance on how to interpret the seasonal_trough_year feature that feat_stl generates for my time series. My understanding is that the output would be an integer that maps to the given instance of the seasonality. i.e. for monthly seasonality, 1 means January, 6 means June, 12 means December (mapping to month.abb).

For example: yearmonth("2020 Jan") |> month() returns 1.

However, I am getting 0 for the seasonal_trough_year of one of my time series.

I am pulling "Labor Force Participation Rate - 20 Yrs. & over" for men and women from FRED. Reproducible code:

library(fpp3)
library(purrr)
library(fredr)

#get api key from https://fred.stlouisfed.org/docs/api/api_key.html

fred_df_raw <- c("men >= age 20" = "LNU01300025",
                 "women >= age 20" = "LNU01300026") |> 
  map_dfr(fredr, .id = "series")

fred_df <- fred_df_raw |> 
  select(series, date, value) |> 
  rename(participation_rate = value) |> 
  mutate(date = yearmonth(date)) |> 
  as_tsibble(key = series, index = date)

fred_df |> 
  model(stl = STL(participation_rate ~ trend() + season())) |> 
  components() |>
  autoplot()

fred_tile <- tile_tsibble(fred_df, .size = 4*12) |> 
  arrange(series, .id) |> 
  group_by(series, .id) |> 
  filter(n() == 4*12) |> #only keep complete tiles
  ungroup()

fred_tile_features <- fred_tile |>
  features(participation_rate, feature_set(pkgs = "feasts")) 

fred_tile_features |> 
  distinct(seasonal_trough_year)

fred_tile_features |> 
  distinct(seasonal_peak_year)

fred_tile_features |> 
  select(series, .id, seasonal_peak_year, seasonal_trough_year) |> 
  filter(seasonal_trough_year == 0)

fred_tile |>
  filter(.id == 18,
         series == "men >= age 20") |> 
  autoplot()

#i would expect the trough to be December
fred_tile |>
  filter(.id == 18,
         series == "men >= age 20") |> 
  model(stl = STL(participation_rate ~ trend() + season())) |> 
  components() |> 
  ggplot(aes(date, season_year)) +
  geom_line() +
  geom_point() +
  geom_point(aes(color = month(date, abbr = TRUE, label = TRUE) == "Dec"))

enter image description here

Definitions from the FPP3 book:

Upvotes: 1

Views: 22

Answers (1)

Mitchell O&#39;Hara-Wild
Mitchell O&#39;Hara-Wild

Reputation: 2459

The current implementation of feat_stl() defines the seasonal peak and trough as seasonal observations since time 1 (the first time point).

library(feasts)
#> Loading required package: fabletools
#> Registered S3 method overwritten by 'tsibble':
#>   method               from 
#>   as_tibble.grouped_df dplyr
uad <- as_tsibble(USAccDeaths)
uad |> autoplot(value)

time plot of seasonal data


# Peak in July, trough in February
library(dplyr)
#> 
#> Attaching package: 'dplyr'
#> The following objects are masked from 'package:stats':
#> 
#>     filter, lag
#> The following objects are masked from 'package:base':
#> 
#>     intersect, setdiff, setequal, union
uad |> 
  as_tibble() |> 
  group_by(month = lubridate::month(index, label = TRUE)) |> 
  summarise(avg = mean(value)) |> 
  arrange(avg)
#> # A tibble: 12 × 2
#>    month    avg
#>    <ord>  <dbl>
#>  1 Feb    7284.
#>  2 Jan    8044 
#>  3 Mar    8062.
#>  4 Apr    8275.
#>  5 Nov    8467.
#>  6 Sep    8700.
#>  7 Dec    8721.
#>  8 Oct    8990.
#>  9 May    9124.
#> 10 Jun    9595.
#> 11 Aug    9749.
#> 12 Jul   10453.

# First observation at 1973 Jan, trough at '2' (February) and peak at '7' (July)
uad |> features(value, feat_stl)
#> # A tibble: 1 × 9
#>   trend_strength seasonal_strength_year seasonal_peak_year seasonal_trough_year
#>            <dbl>                  <dbl>              <dbl>                <dbl>
#> 1          0.802                  0.945                  7                    2
#> # ℹ 5 more variables: spikiness <dbl>, linearity <dbl>, curvature <dbl>,
#> #   stl_e_acf1 <dbl>, stl_e_acf10 <dbl>
# First observation at 1973 Feb, trough at '1' (February) and peak at '6' (July)
uad |> tail(-1) |> features(value, feat_stl)
#> # A tibble: 1 × 9
#>   trend_strength seasonal_strength_year seasonal_peak_year seasonal_trough_year
#>            <dbl>                  <dbl>              <dbl>                <dbl>
#> 1          0.788                  0.944                  6                    1
#> # ℹ 5 more variables: spikiness <dbl>, linearity <dbl>, curvature <dbl>,
#> #   stl_e_acf1 <dbl>, stl_e_acf10 <dbl>

Created on 2025-01-24 with reprex v2.1.1

This is because the current implementation is based on numeric seasonal periods (modular arithmetic) has no sense of a 'seasonal origin', so it simply uses the first time point in the data. This is similar to how some places of the world use calendars where the day starts on Monday, and others start on Sunday. Fundamentally, days of weeks could start on Wednesday and months of years could start on February - this makes no difference mathematically.

Ideally it would explicitly state the name of the season (e.g. trough: February and peak: July), however this currently isn't possible (at least not in full generality). I'm currently working on a time+calendar package mixtime which will improve this with support for calendar-based seasons/cycles (rather than the current numeric implementation of 12 for months).

Upvotes: 1

Related Questions