Pistol Pete
Pistol Pete

Reputation: 1077

Python Kernel Smoothing

I have some R code that I am trying to replicate in Python. In the R file, I have a dataframe and I smooth one column of the dataframe with

smoothedTime <- ksmooth(1:length(df$time), df$time, bandwidth=100, x.points=(1:length(df$time)))$y

In Python, I am using the scikit-fda library and skfda.preprocessing.smoothing.kernel_smoothers.NadarayaWatsonSmoother() to do the smoothing, with the smoothing_parameter set to 100, because that is what the R ksmooth function is based on. The problem that I am encountering is that the smoothing I'm getting is not the same. By default, the kernel in ksmooth is c("box", "normal"), but I don't see a box kernel for NadarayaWatsonSmoother(). So, because the NadarayaWatsonSmoother() has a normal kernel by default, I tried

smoothedTime <- ksmooth(1:length(df$time), df$time, bandwidth=100, kernel=c("normal"), x.points=(1:length(df$time)))$y

and the results were still different. I'm wondering why I'm not getting the same answers, and what I can do to get the same answers.

The relevant code is

Python Code:

import skfda
from skfda import FDataGrid
from skfda.misc import kernels
import skfda.preprocessing.smoothing.kernel_smoothers as ks

myTime = [-0.01, -0.02, -0.01, -0.01, -0.04, -0.05, -0.07, -0.1, -0.12, -0.15, -0.19, -0.22, -0.26, -0.27, -0.31, -0.33, -0.36, -0.38, -0.4, -0.42, -0.44, -0.44, -0.46, -0.47, -0.48, -0.49, -0.5, -0.49, -0.51, -0.51, -0.51, -0.51, -0.5, -0.48, -0.48, -0.46, -0.45, -0.43, -0.41, -0.39, -0.37, -0.34, -0.34, -0.32, -0.31, -0.32, -0.35, -0.35, -0.37, -0.39, -0.42, -0.45, -0.5, -0.52, -0.55, -0.58, -0.6, -0.6, -0.6, -0.6]
fd = FDataGrid(sample_points=[*range(1, len(myTime)+1)],
           data_matrix=[myTime])
smoother = ks.NadarayaWatsonSmoother(smoothing_parameter=100)
smoothed = smoother.fit_transform(fd)

R Code:

df$time <- c(-0.01, -0.02, -0.01, -0.01, -0.04, -0.05, -0.07, -0.1, -0.12, -0.15, -0.19, -0.22, -0.26, -0.27, -0.31, -0.33, -0.36, -0.38, -0.4, -0.42, -0.44, -0.44, -0.46, -0.47, -0.48, -0.49, -0.5, -0.49, -0.51, -0.51, -0.51, -0.51, -0.5, -0.48, -0.48, -0.46, -0.45, -0.43, -0.41, -0.39, -0.37, -0.34, -0.34, -0.32, -0.31, -0.32, -0.35, -0.35, -0.37, -0.39, -0.42, -0.45, -0.5, -0.52, -0.55, -0.58, -0.6, -0.6, -0.6, -0.6)
smoothedTime <- ksmooth(1:length(df$time), df$time, kernel="normal", bandwidth=100, x.points=(1:length(df$time)))$y

Upvotes: 0

Views: 1745

Answers (1)

Mabus
Mabus

Reputation: 1493

The reason for this behaviour is that the ksmooth function in R has a different scaling for different kernels (see the source code), while scikit-fda simply divides by the passed bandwith before applying the kernel. You can obtain the same results as in R if you multiply the smoothing_parameter by 0.3706506 (for a normal kernel) or by 0.5 (for a box kernel; notice that this kernel can also be used in scikit-fda passing the parameter kernel=skfda.misc.kernels.uniform).

Disclaimer: I am the maintainer of scikit-fda. Sorry for my late answer but I am not notified when a question mentioning it appears in this page. If you have future questions regarding the package, you can try opening an issue or a discussion. I am notified of these and usually can answer in a few hours or days.

Upvotes: 2

Related Questions