Niels Basjes
Niels Basjes

Reputation: 10642

Interpolating data points in Excel

I'm sure this is the kind of problem other have solved many times before.

A group of people are going to do measurements (Home energy usage to be exact). All of them will do that at different times and in different intervals.

So what I'll get from each person is a set of {date, value} pairs where there are dates missing in the set.

What I need is a complete set of {date, value} pairs where for each date withing the range a value is known (either measured or calculated). I expect that a simple linear interpolation would suffice for this project.

If I assume that it must be done in Excel. What is the best way to interpolate in such a dataset (so I have a value for every day) ?

Thanks.

NOTE: When these datasets are complete I'll determine the slope (i.e. usage per day) and from that we can start doing home-to-home comparisons.

ADDITIONAL INFO After first few suggestions: I do not want to manually figure out where the holes are in my measurement set (too many incomplete measurement sets!!). I'm looking for something (existing) automatic to do that for me. So if my input is

{2009-06-01,  10}
{2009-06-03,  20}
{2009-06-06, 110}

Then I expect to automatically get

{2009-06-01,  10}
{2009-06-02,  15}
{2009-06-03,  20}
{2009-06-04,  50}
{2009-06-05,  80}
{2009-06-06, 110}

Yes, I can write software that does this. I am just hoping that someone already has a "ready to run" software (Excel) feature for this (rather generic) problem.

Upvotes: 20

Views: 99265

Answers (7)

alexkovelsky
alexkovelsky

Reputation: 4190

You can find out which formula fits best your data, using Excel's "trend line" feature. Using that formula, you can calculate y for any x

  1. Create linear scatter (XY) for it (Insert => Scatter);
  2. Create Polynominal or Moving Average trend line, check "Display Equation on chart" (right-click on series => Add Trend Line);
  3. Copy the equation into cell and replace x's with your desired x value

On screenshot below A12:A16 holds x's, B12:B16 holds y's, and C12 contains formula that calculates y for any x.

Excel Interpolation

I first posted an answer here, but later found this question

Upvotes: 1

Deniss
Deniss

Reputation: 146

The easiest way to do it probably is as follows:

  1. Download Excel add-on here: XlXtrFun™ Extra Functions for Microsoft Excel

  2. Use function intepolate(). =Interpolate($A$1:$A$3,$B$1:$B$3,D1,FALSE,FALSE)

Columns A and B should contain your input, and column G should contain all your date values. Formula goes into the column E.

Upvotes: 5

YGA
YGA

Reputation: 10010

I came across this and was reluctant to use an add-in because it makes it tough to share the sheet with people who don't have the add-in installed.

My officemate designed a clean formula that is relatively compact (at the expensive of using a bit of magic).

Things to note:

  • The formula works by:

    • using the MATCH function to find the row in the inputs range just before the value being searched for (e.g. 3 is the value just before 3.5)
    • using OFFSETs to select the square of that line and the next (in light purple)
    • using FORECAST to build a linear interpolation using just those two points, and getting the result
  • This formula cannot do extrapolations; make sure that your search value is between the endpoints (I do this in the example below by having extreme values).

Not sure if this is too complicated for folks; but it had the benefit of being very portable (and simpler than many alternate solutions).

If you want to copy-paste the formula, it is:

=FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1

(inputs being a named range)

Upvotes: 30

DakotaD
DakotaD

Reputation: 371

The answer above by YGA doesn't handle end of range cases where the desired X value is the same as the reference range's X value. Using the example given by YGA, the excel formula would return #DIV/0! error if an interpolated value at 9999 was asked for. This is obviously part of the reason why YGA added the extreme endpoints of 9999 and -9999 to the input data range, and then assumes that all forecasted values are between these two numbers. If such padding is undesired or not possible, another way to avoid a #DIV/0! error is to check for an exact input value match using the following formula:

=IF(ISNA(MATCH(F3,inputs,0)),FORECAST(F3,OFFSET(inputs,MATCH(F3,inputs)-1,1,2,1),OFFSET(inputs,MATCH(F3,inputs)-1,0,2,1)),OFFSET(inputs,MATCH(F3,inputs)-1,1,1,1))

where F3 is the value where interpolated results are wanted.

Note: I would have just added this as a comment to the original YGA post, but I don't have enough reputation points yet.

Upvotes: 2

darren
darren

Reputation: 11

alternatively.

=INDEX(yVals,MATCH(J7,xVals,1))+(J7-MATCH(J7,xVals,1))*(INDEX(yVals,MATCH(J7,xVals,1)+1)-INDEX(yVals,MATCH(J7,xVals,1)))/(INDEX(xVals,MATCH(J7,xVals,1)+1)-MATCH(J7,xVals,1))

where j7 is the x value.

xvals is range of x values yvals is range of y values

easier to put this into code.

Upvotes: 1

Stewbob
Stewbob

Reputation: 16899

A nice graphical way to see how well your interpolated results fit:

Take your date,value pairs and graph them using the XY chart in Excel (not the Line chart). Right-click on the resulting line on the graph and click 'Add trendline'. There are lots of different options to choose which type of curve fitting is used. Then you can go to the properties of the newly created trendline and display the equation and the R-squared value.

Make sure that when you format the trendline Equation label, you set the numerical format to have a high degree of precision, so that all of the significant digits of the equation constants are displayed.

Upvotes: 2

Bill the Lizard
Bill the Lizard

Reputation: 405715

There are two functions, LINEST and TREND, that you can try to see which gives you the better results. They both take sets of known Xs and Ys along with a new X value, and calculate a new Y value. The difference is that LINEST does a simple linear regression, while TREND will first try to find a curve that fits your data before doing the regression.

Upvotes: 6

Related Questions