R Maharaj
R Maharaj

Reputation: 115

Applying LOESS smoothing to all columns

I am trying to write a script which will apply LOESS smoothing to all columns of my dataframe. I'm using the fANCOVA package which has the loess.as function which has an automated parameter selection.

Here is my dataframe called dat

> dat
        date AUSTRIA GERMANY SWITZERLAND
1   5/1/2022    3908   15769        1023
2   5/2/2022    5541   87119        2994
3   5/3/2022    7071  106908        2148
4   5/4/2022    5920   89796        2150
5   5/5/2022    6023   80521        1918
6   5/6/2022    5075   67328        1909
7   5/7/2022    3429   26870        1183
8   5/8/2022    3883   13256         834
9   5/9/2022    4681   84985        2496
10 5/10/2022    5894   94974        1965
11 5/11/2022    5000   77460        1846
12 5/12/2022    4726   65100        1708
13 5/13/2022    3681   56920        1545
14 5/14/2022    3293   20062         937
15 5/15/2022    2478    9603         696
16 5/16/2022    3247   65791        1939
17 5/17/2022    4376   68200        1476
18 5/18/2022    3365   53791        1426
19 5/19/2022    3117   44358        1269
20 5/20/2022    2693   39967        1080
21 5/21/2022    1734   13764         773
22 5/22/2022    1744    6922         592
23 5/23/2022    2403   47125        1459
24 5/24/2022    3130   46690        1113
25 5/25/2022    2597   35348        1084
26 5/26/2022    1736      NA         503
27 5/27/2022    1813      NA        1162
28 5/28/2022    1881      NA          NA
29 5/29/2022    1736      NA          NA
30 5/30/2022    2389      NA          NA
31 5/31/2022    3571      NA          NA

Then to apply loess.as to Austria for example

dat$date <- as.Date(dat$date, format = "%m/%d/%Y")

dat$DATE_NUM = as.numeric(dat$date)

fit <- loess.as(df$DATE_NUM,df$AUSTRIA, degree=2, criterion="gcv", family ="gaussian",user.span = NULL, plot = F)$fitted

fit will return the desired fitted values for Austria. I want to able to do this same operation across columns for the other countries (note that each has different length), and the output should be in columns similar to the structure of dat. I have tried to look at the solution to a similar question, but I was struggling to follow and ran into errors.

Upvotes: 1

Views: 368

Answers (1)

akrun
akrun

Reputation: 887158

We could do this with lapply/sapply in base R

library(fANCOVA)
out <- sapply(dat[c("AUSTRIA", "GERMANY", "SWITZERLAND")], \(x) {
     i1 <- complete.cases(x)
  replace(x, i1, loess.as(dat$DATE_NUM[i1], x[i1], degree = 2,
  criterion = "gcv",family ="gaussian",user.span = NULL, plot = FALSE)$fitted)
   })

-output

> head(out)
      AUSTRIA   GERMANY SWITZERLAND
[1,] 4014.674  18716.87    1598.758
[2,] 5564.295  79813.73    2015.231
[3,] 6384.348 104581.65    2200.749
[4,] 6497.308  94165.40    2204.399
[5,] 5770.513  81417.57    1895.429
[6,] 4748.673  61337.24    1547.563

data

dat <- structure(list(date = structure(c(19113, 19114, 19115, 19116, 
19117, 19118, 19119, 19120, 19121, 19122, 19123, 19124, 19125, 
19126, 19127, 19128, 19129, 19130, 19131, 19132, 19133, 19134, 
19135, 19136, 19137, 19138, 19139, 19140, 19141, 19142, 19143
), class = "Date"), AUSTRIA = c(3908L, 5541L, 7071L, 5920L, 6023L, 
5075L, 3429L, 3883L, 4681L, 5894L, 5000L, 4726L, 3681L, 3293L, 
2478L, 3247L, 4376L, 3365L, 3117L, 2693L, 1734L, 1744L, 2403L, 
3130L, 2597L, 1736L, 1813L, 1881L, 1736L, 2389L, 3571L), GERMANY = c(15769L, 
87119L, 106908L, 89796L, 80521L, 67328L, 26870L, 13256L, 84985L, 
94974L, 77460L, 65100L, 56920L, 20062L, 9603L, 65791L, 68200L, 
53791L, 44358L, 39967L, 13764L, 6922L, 47125L, 46690L, 35348L, 
NA, NA, NA, NA, NA, NA), SWITZERLAND = c(1023L, 2994L, 2148L, 
2150L, 1918L, 1909L, 1183L, 834L, 2496L, 1965L, 1846L, 1708L, 
1545L, 937L, 696L, 1939L, 1476L, 1426L, 1269L, 1080L, 773L, 592L, 
1459L, 1113L, 1084L, 503L, 1162L, NA, NA, NA, NA), DATE_NUM = c(19113, 
19114, 19115, 19116, 19117, 19118, 19119, 19120, 19121, 19122, 
19123, 19124, 19125, 19126, 19127, 19128, 19129, 19130, 19131, 
19132, 19133, 19134, 19135, 19136, 19137, 19138, 19139, 19140, 
19141, 19142, 19143)), row.names = c("1", "2", "3", "4", "5", 
"6", "7", "8", "9", "10", "11", "12", "13", "14", "15", "16", 
"17", "18", "19", "20", "21", "22", "23", "24", "25", "26", "27", 
"28", "29", "30", "31"), class = "data.frame")

Upvotes: 1

Related Questions