Reputation: 752

Find the "peak" of a set of data

I have a set of data, for which I'd like to find an average peak. I've done some testing in Numbers.app to see what I'm after and if I make a chart of the dataset it has a feature it calls "polynomial trendline" which draws a curve of the data and the peak of that curve looks exactly like the point/value I'm after.

So how could I programmatically calculate that curve and find that tangent on the curve?

I've been looking around on wikipedia and found topics like "Normal distribution" and "Polynomial regression" which seems very much related, but I've always found it hard to follow the equations on wikipedia so I'm hoping maybe someone here could give me a programatic example.

Here's a couple of charts to illustrate what I'm after. The green dots are the data points and the blue line is the "polynomial trendline" (of order 6). ~~The "peak" of that trendline is what I'm after.~~

Example with even dataset Example with uneven dataset

Updated question:

After some answers I realize my question need to be rephrased as the problem is not really how to find the peak of the curve but more of how to generate that blue curve from the green points so I can find where in the dataset the "weight" lies. The goal is to get a sort of 'average maximum'.

I guess another question would be "what is this particular problem actually called?" ;)

Upvotes: 5

Answers (6)

Chris

Reputation: 1

I'm a total "R" newbie but I've been working through the same thing in my own data so I thought I would share. I am sure I'll get tonnes of slack for this being a bad way of doing it (or not a 'neat' way of doing it) but serves its purpose for me - at least for now.

I have 50 data sets that have a peak shape like yours (large slope on the leading each, slower slope on the declining edge). First I tested a number of polynomial fits for the best "fit for purpose" but not over fitting... x<-dataset$x ## or pull from column in table e.g., dataset[,1] y<-dataset$y ## or pull from column in table e.g., dataset[,2] k=2 ## knew it was polynomial so started with 3 while(k<100) { k=k+1 fit=lm(y~poly(x,k,raw=TRUE)) var[k]=summary(fit)$sigma } plot(var)

In this case, a polynomial of 11 was the best fit without over fitting. You can then run an ANOVA and make sure but I'll skip all that.

Now I created my polynomial from the coefficients of the "lm" above. fit=lm(y~poly(x,11,raw=TRUE)) fit.coef <- c(summary(fit)$coefficients[1,1], summary(fit)$coefficients[2,1],... fit.poly <- polynomial(fit.coef)

Then the derivative:

fit.deriv <- deriv(fit.poly)

Now for the slope at the peak you can simply substitute the value of x (max) from your original polynomial to the derivative.

I wanted all the slopes so...

fit.slope <- predict(fit.deriv,x) ## x here represents all the x values above.  For a single value you can just replace x with the value of x representing the max value in your polynomial

Hope that both helps the original question and at the same time invites comments on how to do this better because I'd love to learn and clean up my codes too!

Thanks.

Upvotes: 0

Cade Roux

Reputation: 89721

You could start with calculating the mean and standard deviation/variance. This would tell you some information about the distribution.

I don't think you'll be able to solve the problem for an arbitrary data set. So you would need to have some common characteristic behavior.

After all, fitting a curve can be somewhat arbitrary depending upon the method - it needs to be chosen appropriately for your problem domain - perhaps there needs to be some weighting or data cleansing to throw out outlying values first.

Upvotes: 2

Hari Menon

Reputation: 35465

Lets say you are plotting Y vs X. You already have the values of Y corresponding to each X. Let Y(X1) mean value of Y when X=X1.

Set a variable max = 0. Then calculate value of Y at each X. If Y(X1) > max then set max=Y(X). Once you go through all the Ys, what you'll have in max will be the peak value of Y.

e.g in your example just go through all green dots and find the maximum of them. That would be the peak, right? Let me know if that's what you wanted. Which programming language are you using? You don't need to go into distributions and stuff just to get the peak..

Upvotes: 1

nico

Reputation: 51680

Although the data looks like that you're not necessarily after a normal distribution.

The topic of distribution fitting is quite complex and, unless you have some clear a priori assumptions of what your data distribution is, I would not venture there. In case you have assumptions on the type of distribution, have a look at least squares or maximum likelihood extimation methods.

However, I would suggest you should rather use a bezier-spline or LOESS to "smooth" your data and then just find the maximum of the computed curve.

I doubt that an approach using the derivative would work here.

Upvotes: 5

rubenvb

Reputation: 76785

As you speak of normal distributions, and seem to be able to fit data to a function, you should fit to a normal distribution, which jas parameters µ and σ, which are respectively the mean and standard deviation of the distribution (see wiki first formula).

Fit this function to your data, and the peak will be at the mean value, given by µ.

Upvotes: 2

Andrey

Reputation: 60095

Derivative is equal to zero at peaks.

Upvotes: 1

Find the &quot;peak&quot; of a set of data

Answers (6)

Related Questions

Find the "peak" of a set of data