Amit Kohli
Amit Kohli

Reputation: 2950

R: Why isn't this matrix 3d linear interpolation working correctly?

I have a matrix of values and zeros, where zero= NA. The values are interspersed around the matrix, and what I want to do is interpolate the values of all the NA values. This is the data:

enter image description here

I'm trying to guess all of these values by taking all the known values in my matrix, and multiplying the value by the distance (such that the further away a point is, the less influence it has). This is what the interpolated result looks like: enter image description here

As you can see, this method is not very effective, it does affect the NAs nearest to the known values, but then they quickly converge onto an average value. I think this is due to the fact that it's taking the ENTIRE RANGE, which has many ups and downs... rather than just the points nearest to it.

Obviously, matrix operations aren't my specialty... what do I need to change to correctly do the linear-interpolation?

Here's the code:

library(dplyr)
library(plotly)

Cont <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 1816, 2320, 1406, 2028, 1760, 1932, 1630, 
                    1835, 1873, 1474, 1671, 2073, 1347, 2131, 2038, 1969, 2036, 1602, 
                    1986, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 2311, 1947, 2094, 1947, 2441, 1775, 1461, 1260, 
                    1494, 2022, 1863, 1587, 2082, 1567, 1770, 2065, 1404, 1809, 1972, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
                    0, 0, 0, 2314, 1595, 2065, 1870, 2178, 1410, 1994, 1979, 2111, 
                    1531, 1917, 1559, 2109, 1921, 1606, 1469, 1601, 1771, 1771), .Dim = c(19L, 
                                                                                       30L))

  ## First get real control values
  idx <- which(Cont > 0, arr.ind=TRUE)
  V <- Cont[idx]
  ControlValues <- data.frame(idx,V)

  ## Make data.frame of values to fill
  toFill <- which(Cont == 0, arr.ind=TRUE) %>% as.data.frame
  toFill$V <- 0

  ## And now figure out the weighted value of each point
  for (i in 1:nrow(toFill)){
    toFill[i,] -> CurrentPoint

    Xs <- (1/abs(CurrentPoint[,1] - ControlValues[,1])) 
    Xs[is.infinite(Xs)] <- 0
    Xs <- Xs/sum(Xs)/100

    Ys <- (1/abs(CurrentPoint[,2] - ControlValues[,2])) 
    Ys[is.infinite(Ys)] <- 0
    Ys <- Ys/sum(Ys)/100

    ControlValues1 <- data.frame(Xs,Ys)
    toFill[i,3] <- sum(rowMeans(ControlValues1) * ControlValues$V)*100
  }

  ## add back in the controls and reorder
  bind_rows(ControlValues,toFill) -> Both
  Both %>% arrange(row,col) -> Both

  ## and plot the new surface
  NewCont <- matrix(Both$V,max(Both$row),max(Both$col),byrow = T)
  plot_ly(z=NewCont, type="surface",showscale=FALSE) 

Upvotes: 2

Views: 1360

Answers (1)

aichao
aichao

Reputation: 7455

One approach to interpolate and extrapolate data in R is to use the akima package. The following performs bi-linear interpolation followed by extrapolation using as input the known data points in the data frame ControlValues to fill the zeroes in Cont.

library(akima)
library(plotly)

NewCont <- akima::interp(x=ControlValues[,1], y=ControlValues[,2], z=ControlValues[,3],
                         xo=1:nrow(Cont), yo=1:ncol(Cont), linear=TRUE)$z
NewCont[,1:9] <- akima::interp.old(x=ControlValues[,1], y=ControlValues[,2], 
                                   z=ControlValues[,3], xo=1:nrow(Cont), 
                                   yo=1:9, ncp=2, extrap=TRUE)$z

plot_ly(z=NewCont, type="surface",showscale=FALSE)

Notes:

  1. The first call to akima::interp performs the bi-linear interpolation. See the help page ?akima::interp for usage and details.

    • A key point is that the inputs x, y, and z for the known data points need not be on a x-y grid. In this case, these are the columns of ControlValues.
    • The output of akima::interp is a list whose z component is a matrix of interpolated values over the grid whose x and y coordinates are defined by the inputs xo and yo, respectively. In this case, these are just the row and column indices of Cont
    • As stated in the help page

    z-values for points outside the convex hull are returned as NA.

    In this case, the first nine columns of the output corresponding to yo=1:9 will be NAs.

  2. The second call to akima::interp (actually akima::interp.old) performs the data extrapolation to fill in the NAs left by the first call. See this SO quation/answer for the details of this usage.

The above approach gives the following result

NewCont

Another approach to perform bi-linear interpolation is to use the interp.surface function in the fields package. This approach is mentioned because the implementation is an R-script, which can be listed by typing the function name interp.surface at the R command line.

library(fields)

loc <- make.surface.grid(list(x=1:nrow(Cont), y=1:ncol(Cont)))
NewCont2 <- matrix(interp.surface(list(x=sort(unique(ControlValues[,1])),
                                       y=sort(unique(ControlValues[,2])),
                                       z=matrix(ControlValues[,3],
                                                nrow=length(unique(ControlValues[,1])),
                                                ncol=length(unique(ControlValues[,2])))),
                                  loc), nrow=nrow(Cont), ncol=ncol(Cont))
NewCont2[,1:9] <- akima::interp.old(x=ControlValues[,1], y=ControlValues[,2], 
                                    z=ControlValues[,3], xo=1:nrow(Cont), 
                                    yo=1:9, ncp=2, extrap=TRUE)$z

Here, the requirements are the opposite to those for akima::interp. Specifically, the known data points must lie on a x-y grid. However, the coordinates to interpolate need not be on a grid and is instead a matrix containing corresponding column vectors of x and y coordinates where each tuple (x[i],y[i]) is a x-y coordinate to interpolate. Since the data points in ControlValues are on a grid, these requirements are also satisfied for this case. See the help page ?interp.surface for usage and details.

Notes:

  1. sort(unique(ControlValues[,1])) and sort(unique(ControlValues[,2])) simply gives the x and y coordinates for the grid of known data points
  2. The z component in the list is simply the z values for the known data points reshaped as a matrix over the grid of known data points
  3. The matrix of coordinates to interpolate is generated by make.surface.grid using as x and y coordinates the row and column indices of Conf, respectively
  4. A coordinate to interpolate that lies outside the grid of known points will result in a interpolated value of NA
  5. interp.surface returns a vector of z values corresponding to the coordinates to interpolate. This is then rehaped to a matrix over the grid of coordinates to interpolate, which has dimensions nrow(Cont) by ncol(Cont)

Finally, it is easy to verify that the two approaches give the same result

print(max(abs(NewCont - NewCont2)))
##[1] 4.547474e-13

Upvotes: 1

Related Questions