Fitting distribution on curve

Question

I am trying to fit a curve to points to calculate the maximum as well as the center.

I have two dimensional data (x and y axis) and I am using the package pracma with the findpeak function to detect peaks in this matrix.

peak <- structure(c(86.0953485633098, 86.0955380284491, 86.0957274942138, 
                    86.095916960604, 86.0961064276196, 86.0962958952606, 86.096485363527, 
                    86.0966748324189, 86.0968643019362, 86.0970537720789, 86.097243242847, 
                    86.0974327142406, 86.0976221862597, 86.0978116589041, 86.0980011321741, 
                    86.0981906060695, 86.0983800805903, 86.0985695557366, 0, 178.362274169922, 
                    1118.56115722656, 1993.09130859375, 2681.42016601562, 3771.77612304688, 
                    12054.2744140625, 41142.8828125, 82771.796875, 126647.4296875, 
                    124873.390625, 80395.8046875, 38812.58203125, 11192.2685546875, 
                    3193.95947265625, 1768.93249511719, 487.752838134766, 0), 
                  .Dim = c(18L, 2L))
pracma::findpeaks(peak[,2])
plot(peak)

My problem is, that findpeak only gives the maximum within the data points but I guess by plotting everyone can imagine that the maximum is a bit higher and as well as on a different y coordinate. What I expect is roughly x=86.09712 and y=12750.00... Something like that.

I found also MASS::fitdistr but this only works for distributions and not for my data. I guess my question is already asked but I couldn't find a solution, maybe I am missing the correct vocabulary to search for.

alistaire · Accepted Answer

One way to do it is to make a model, then use predict to find the fitted values within a range where you know it's high, within which the maximum can be found with which.max:

peak <- structure(c(86.0953485633098, 86.0955380284491, 86.0957274942138, 
                    86.095916960604, 86.0961064276196, 86.0962958952606, 86.096485363527, 
                    86.0966748324189, 86.0968643019362, 86.0970537720789, 86.097243242847, 
                    86.0974327142406, 86.0976221862597, 86.0978116589041, 86.0980011321741, 
                    86.0981906060695, 86.0983800805903, 86.0985695557366, 0, 178.362274169922, 
                    1118.56115722656, 1993.09130859375, 2681.42016601562, 3771.77612304688, 
                    12054.2744140625, 41142.8828125, 82771.796875, 126647.4296875, 
                    124873.390625, 80395.8046875, 38812.58203125, 11192.2685546875, 
                    3193.95947265625, 1768.93249511719, 487.752838134766, 0), 
                  .Dim = c(18L, 2L))
df_peak <- as.data.frame(peak)
names(df_peak) <- c('x', 'y')

# use any model that nicely fits your data
model <- mgcv::gam(y ~ s(x), data = df_peak)
df_peak$fitted <- predict(model)

max_zone_x <- df_peak$x[tail(order(df_peak$fitted))]
max_zone <- data.frame(x = seq(max_zone_x[1], max_zone_x[2], length.out = 1000))
max_zone$y <- predict(model, max_zone)

max_zone[which.max(max_zone$y), ]
#>            x        y
#> 507 86.09714 129699.8

This approach is limited to finding one peak at a time, but there are likely more robust alternatives from the signal processing world for multimodal data.

Fitting distribution on curve

Answers (1)

Related Questions