Reputation: 1077
I have some R code that I am trying to replicate in Python. In the R file, I have a dataframe and I smooth one column of the dataframe with
smoothedTime <- ksmooth(1:length(df$time), df$time, bandwidth=100, x.points=(1:length(df$time)))$y
In Python, I am using the scikit-fda library and skfda.preprocessing.smoothing.kernel_smoothers.NadarayaWatsonSmoother()
to do the smoothing, with the smoothing_parameter
set to 100, because that is what the R ksmooth
function is based on. The problem that I am encountering is that the smoothing I'm getting is not the same. By default, the kernel in ksmooth
is c("box", "normal")
, but I don't see a box kernel for NadarayaWatsonSmoother()
. So, because the NadarayaWatsonSmoother()
has a normal kernel by default, I tried
smoothedTime <- ksmooth(1:length(df$time), df$time, bandwidth=100, kernel=c("normal"), x.points=(1:length(df$time)))$y
and the results were still different. I'm wondering why I'm not getting the same answers, and what I can do to get the same answers.
The relevant code is
Python Code:
import skfda
from skfda import FDataGrid
from skfda.misc import kernels
import skfda.preprocessing.smoothing.kernel_smoothers as ks
myTime = [-0.01, -0.02, -0.01, -0.01, -0.04, -0.05, -0.07, -0.1, -0.12, -0.15, -0.19, -0.22, -0.26, -0.27, -0.31, -0.33, -0.36, -0.38, -0.4, -0.42, -0.44, -0.44, -0.46, -0.47, -0.48, -0.49, -0.5, -0.49, -0.51, -0.51, -0.51, -0.51, -0.5, -0.48, -0.48, -0.46, -0.45, -0.43, -0.41, -0.39, -0.37, -0.34, -0.34, -0.32, -0.31, -0.32, -0.35, -0.35, -0.37, -0.39, -0.42, -0.45, -0.5, -0.52, -0.55, -0.58, -0.6, -0.6, -0.6, -0.6]
fd = FDataGrid(sample_points=[*range(1, len(myTime)+1)],
data_matrix=[myTime])
smoother = ks.NadarayaWatsonSmoother(smoothing_parameter=100)
smoothed = smoother.fit_transform(fd)
R Code:
df$time <- c(-0.01, -0.02, -0.01, -0.01, -0.04, -0.05, -0.07, -0.1, -0.12, -0.15, -0.19, -0.22, -0.26, -0.27, -0.31, -0.33, -0.36, -0.38, -0.4, -0.42, -0.44, -0.44, -0.46, -0.47, -0.48, -0.49, -0.5, -0.49, -0.51, -0.51, -0.51, -0.51, -0.5, -0.48, -0.48, -0.46, -0.45, -0.43, -0.41, -0.39, -0.37, -0.34, -0.34, -0.32, -0.31, -0.32, -0.35, -0.35, -0.37, -0.39, -0.42, -0.45, -0.5, -0.52, -0.55, -0.58, -0.6, -0.6, -0.6, -0.6)
smoothedTime <- ksmooth(1:length(df$time), df$time, kernel="normal", bandwidth=100, x.points=(1:length(df$time)))$y
Upvotes: 0
Views: 1745
Reputation: 1493
The reason for this behaviour is that the ksmooth
function in R has a different scaling for different kernels (see the source code), while scikit-fda simply divides by the passed bandwith before applying the kernel. You can obtain the same results as in R if you multiply the smoothing_parameter
by 0.3706506
(for a normal kernel) or by 0.5
(for a box kernel; notice that this kernel can also be used in scikit-fda passing the parameter kernel=skfda.misc.kernels.uniform
).
Disclaimer: I am the maintainer of scikit-fda. Sorry for my late answer but I am not notified when a question mentioning it appears in this page. If you have future questions regarding the package, you can try opening an issue or a discussion. I am notified of these and usually can answer in a few hours or days.
Upvotes: 2