Reputation: 2950
I have a matrix of values and zeros, where zero= NA
. The values are interspersed around the matrix, and what I want to do is interpolate the values of all the NA
values. This is the data:
I'm trying to guess all of these values by taking all the known values in my matrix, and multiplying the value by the distance (such that the further away a point is, the less influence it has). This is what the interpolated result looks like:
As you can see, this method is not very effective, it does affect the NA
s nearest to the known values, but then they quickly converge onto an average value. I think this is due to the fact that it's taking the ENTIRE RANGE, which has many ups and downs... rather than just the points nearest to it.
Obviously, matrix operations aren't my specialty... what do I need to change to correctly do the linear-interpolation?
Here's the code:
library(dplyr)
library(plotly)
Cont <- structure(c(0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 1816, 2320, 1406, 2028, 1760, 1932, 1630,
1835, 1873, 1474, 1671, 2073, 1347, 2131, 2038, 1969, 2036, 1602,
1986, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 2311, 1947, 2094, 1947, 2441, 1775, 1461, 1260,
1494, 2022, 1863, 1587, 2082, 1567, 1770, 2065, 1404, 1809, 1972,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 2314, 1595, 2065, 1870, 2178, 1410, 1994, 1979, 2111,
1531, 1917, 1559, 2109, 1921, 1606, 1469, 1601, 1771, 1771), .Dim = c(19L,
30L))
## First get real control values
idx <- which(Cont > 0, arr.ind=TRUE)
V <- Cont[idx]
ControlValues <- data.frame(idx,V)
## Make data.frame of values to fill
toFill <- which(Cont == 0, arr.ind=TRUE) %>% as.data.frame
toFill$V <- 0
## And now figure out the weighted value of each point
for (i in 1:nrow(toFill)){
toFill[i,] -> CurrentPoint
Xs <- (1/abs(CurrentPoint[,1] - ControlValues[,1]))
Xs[is.infinite(Xs)] <- 0
Xs <- Xs/sum(Xs)/100
Ys <- (1/abs(CurrentPoint[,2] - ControlValues[,2]))
Ys[is.infinite(Ys)] <- 0
Ys <- Ys/sum(Ys)/100
ControlValues1 <- data.frame(Xs,Ys)
toFill[i,3] <- sum(rowMeans(ControlValues1) * ControlValues$V)*100
}
## add back in the controls and reorder
bind_rows(ControlValues,toFill) -> Both
Both %>% arrange(row,col) -> Both
## and plot the new surface
NewCont <- matrix(Both$V,max(Both$row),max(Both$col),byrow = T)
plot_ly(z=NewCont, type="surface",showscale=FALSE)
Upvotes: 2
Views: 1360
Reputation: 7455
One approach to interpolate and extrapolate data in R is to use the akima
package. The following performs bi-linear interpolation followed by extrapolation using as input the known data points in the data frame ControlValues
to fill the zeroes in Cont
.
library(akima)
library(plotly)
NewCont <- akima::interp(x=ControlValues[,1], y=ControlValues[,2], z=ControlValues[,3],
xo=1:nrow(Cont), yo=1:ncol(Cont), linear=TRUE)$z
NewCont[,1:9] <- akima::interp.old(x=ControlValues[,1], y=ControlValues[,2],
z=ControlValues[,3], xo=1:nrow(Cont),
yo=1:9, ncp=2, extrap=TRUE)$z
plot_ly(z=NewCont, type="surface",showscale=FALSE)
Notes:
The first call to akima::interp
performs the bi-linear interpolation. See the help page ?akima::interp
for usage and details.
x
, y
, and z
for the known data points need not be on a x-y
grid. In this case, these are the columns of ControlValues
. akima::interp
is a list whose z
component is a matrix of interpolated values over the grid whose x
and y
coordinates are defined by the inputs xo
and yo
, respectively. In this case, these are just the row and column indices of Cont
z-values for points outside the convex hull are returned as
NA
.
In this case, the first nine columns of the output corresponding to yo=1:9
will be NA
s.
The second call to akima::interp
(actually akima::interp.old
) performs the data extrapolation to fill in the NA
s left by the first call. See this SO quation/answer for the details of this usage.
The above approach gives the following result
Another approach to perform bi-linear interpolation is to use the interp.surface
function in the fields
package. This approach is mentioned because the implementation is an R-script, which can be listed by typing the function name interp.surface
at the R command line.
library(fields)
loc <- make.surface.grid(list(x=1:nrow(Cont), y=1:ncol(Cont)))
NewCont2 <- matrix(interp.surface(list(x=sort(unique(ControlValues[,1])),
y=sort(unique(ControlValues[,2])),
z=matrix(ControlValues[,3],
nrow=length(unique(ControlValues[,1])),
ncol=length(unique(ControlValues[,2])))),
loc), nrow=nrow(Cont), ncol=ncol(Cont))
NewCont2[,1:9] <- akima::interp.old(x=ControlValues[,1], y=ControlValues[,2],
z=ControlValues[,3], xo=1:nrow(Cont),
yo=1:9, ncp=2, extrap=TRUE)$z
Here, the requirements are the opposite to those for akima::interp
. Specifically, the known data points must lie on a x-y
grid. However, the coordinates to interpolate need not be on a grid and is instead a matrix containing corresponding column vectors of x
and y
coordinates where each tuple (x[i],y[i])
is a x-y
coordinate to interpolate. Since the data points in ControlValues
are on a grid, these requirements are also satisfied for this case. See the help page ?interp.surface
for usage and details.
Notes:
sort(unique(ControlValues[,1]))
and sort(unique(ControlValues[,2]))
simply gives the x
and y
coordinates for the grid of known data pointsz
component in the list is simply the z
values for the known data points reshaped as a matrix over the grid of known data pointsmake.surface.grid
using as x
and y
coordinates the row and column indices of Conf
, respectivelyNA
interp.surface
returns a vector of z
values corresponding to the coordinates to interpolate. This is then rehaped to a matrix over the grid of coordinates to interpolate, which has dimensions nrow(Cont)
by ncol(Cont)
Finally, it is easy to verify that the two approaches give the same result
print(max(abs(NewCont - NewCont2)))
##[1] 4.547474e-13
Upvotes: 1