boshek
boshek

Reputation: 4406

Why is akima::interp interpolating outside the bounds of the original data? - R

If I am using the quakes dataset and the tidyverse to plot an interpolation using the akima package, I am attempting to do it like this:

library(tidyverse)
library(akima)

A function to interpolate:

## Interpolation and convert to a dataframe
dpinterp <- function(x,y,z) {
  interp_df <- interp(x = x, y = y, z = z, duplicate = "strip", extrap = FALSE, nx = 100, ny = 100)
  interp2xyz(interp_df, data.frame=TRUE)
}

Find the max depth for each stations variable. Choosing the stations subset less than 34 for ease of analysis:

quakes_sub <- quakes %>%
  filter(stations <= 34) %>%
  group_by(stations) %>%
  summarise(depth = max(depth)) %>%
  mutate(mag = 4)

A pipe to complete the interpolation then some data cleaning/wrangling:

quakes_interp <- quakes %>%
  filter(stations <= 34) %>%
  do(dpinterp(x = .$stations, y = .$depth, z = .$mag)) %>%
  filter(!is.na(z)) %>%
  rename(stations = x, depth = y, mag = z) 

A plot to visualization and illustrate how the interpolation extends beyond the raw data points. Points in blue:

quakes_interp %>%
  ggplot(aes(x = stations, y = depth, z = mag, fill = mag)) + 
  geom_tile() + 
  scale_y_reverse(expand = c(0,0)) +
  scale_x_continuous(expand = c(0, 0)) +
  #geom_vline(data = quakes_sub, aes(xintercept = stations, colour= depth)) +
  geom_point(data = quakes_sub, aes(x = stations, y = depth, colour = mag)) +
  #stat_contour(aes(fill=..level..), geom="polygon", binwidth=0.005) + 
  #geom_contour(color = "white", alpha = 0.5) + 
  #geom_text(data = meta_data, aes(label = Station)) +
  scale_fill_distiller(palette="RdYlGn", 
                       na.value="white") + 
  theme_minimal() 

enter image description here

Question: Is there a way to constrain an interpolation such that it does not extend beyond the original raw data?

Session Info

R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1

Matrix products: default

locale:
[1] LC_COLLATE=English_Canada.1252  LC_CTYPE=English_Canada.1252    LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C                    LC_TIME=English_Canada.1252    

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] bindrcpp_0.2       padr_0.3.0         lubridate_1.6.0    akima_0.6-2        readxl_1.0.0       dplyr_0.7.1       
 [7] purrr_0.2.2.2      readr_1.1.1        tidyr_0.6.3        tibble_1.3.3       ggplot2_2.2.1.9000 tidyverse_1.1.1   

loaded via a namespace (and not attached):
 [1] reshape2_1.4.2     haven_1.0.0        lattice_0.20-35    colorspace_1.3-2   htmltools_0.3.6    yaml_2.1.14       
 [7] rlang_0.1.1        foreign_0.8-67     glue_1.1.1         RColorBrewer_1.1-2 sp_1.2-4           modelr_0.1.0      
[13] fortunes_1.5-4     bindr_0.1          plyr_1.8.4         stringr_1.2.0      munsell_0.4.3      gtable_0.2.0      
[19] cellranger_1.1.0   rvest_0.3.2        psych_1.7.5        evaluate_0.10      labeling_0.3       knitr_1.16        
[25] forcats_0.2.0      parallel_3.4.0     broom_0.4.2        Rcpp_0.12.11       scales_0.4.1       backports_1.1.0   
[31] jsonlite_1.5       mnormt_1.5-5       hms_0.3            digest_0.6.12      stringi_1.1.5      grid_3.4.0        
[37] rprojroot_1.2      tools_3.4.0        magrittr_1.5       lazyeval_0.2.0     pkgconfig_2.0.1    swtext_0.0.1      
[43] xml2_1.1.1         assertthat_0.2.0   rmarkdown_1.6      httr_1.2.1         R6_2.2.1           nlme_3.1-131      
[49] compiler_3.4.0    

Upvotes: 1

Views: 1108

Answers (1)

Spacedman
Spacedman

Reputation: 94182

The data you are interpolating from comes from quakes with stations less than or equal to 34:

quakes_interp <- quakes %>%
  filter(stations <= 34) %>%
  do(dpinterp(x = .$stations, y = .$depth, z = .$mag))

Let's just extract those:

> qdata = quakes %>% filter(stations <=34)

And now lets plot the locations that interp has interpolated at:

> plot(quakes_interp$stations, quakes_interp$depth)

And now add the data points on top:

> points(qdata$stations,qdata$depth,col="red",pch=19)

And for good measure add the convex hull of the original data points:

> lines(
    qdata[
        chull(qdata$stations, qdata$depth),
        c("stations","depth")],lwd=2)

enter image description here

(upside-down compared to yours)

Your question was "Is there a way to constrain an interpolation such that it does not extend beyond the original raw data?" and I seem to have demonstrated that it doesn't, where the definition of "beyond" is the convex hull.

If you did a separate interp for each station then interp wouldn't go beyond the min and max value for depth, because you would have a one-dimensional case (mag~depth) and the convex hull in one dimension is defined by the min and max of the dimension.

interp is designed for interpolating on continuous coordinates. Your thinking that it shouldn't go beyond the max(depth) for each station is not applicable. Station is, to interps eyes, a continuous coordinate like coordinates in space. In most cases with continuous coordinates the max(x) for any y is going to be the one value of x that goes with that y, because y, being continuous, has only unique values.

The convex hull is one answer to the question of "what is the region of interest for this set of coordinate points?". Other answers may be "the rectangular bounding box [min(x),max(x),min(y),max(y)]" or "The concave hull alpha shape defined by alpha=0.2", or "the coastline of France". If you want to restrict your region of interest that only extends as far as the max(depth) for each station then you need to clip the output of interp manually.

But interp may be totally inappropriate here - if stations is discrete on integers then the value of an interpolation at stations==3.75 may be meaningless.

Actually, interp is probably even wronger. It interpolates based on equality of the coordinates, such that a difference of 1 station is the same as a difference of 1 depth unit. This explains the appearance of horizontal stripes in your plot. If you do the plot with an aspect ratio of 1 by adding coord_equal() to your ggplot you'll see the actual 2d surface that interp is working with.

Upvotes: 3

Related Questions