Reputation: 4406
If I am using the quakes
dataset and the tidyverse
to plot an interpolation using the akima
package, I am attempting to do it like this:
library(tidyverse)
library(akima)
A function to interpolate:
## Interpolation and convert to a dataframe
dpinterp <- function(x,y,z) {
interp_df <- interp(x = x, y = y, z = z, duplicate = "strip", extrap = FALSE, nx = 100, ny = 100)
interp2xyz(interp_df, data.frame=TRUE)
}
Find the max depth for each stations
variable. Choosing the stations
subset less than 34 for ease of analysis:
quakes_sub <- quakes %>%
filter(stations <= 34) %>%
group_by(stations) %>%
summarise(depth = max(depth)) %>%
mutate(mag = 4)
A pipe to complete the interpolation then some data cleaning/wrangling:
quakes_interp <- quakes %>%
filter(stations <= 34) %>%
do(dpinterp(x = .$stations, y = .$depth, z = .$mag)) %>%
filter(!is.na(z)) %>%
rename(stations = x, depth = y, mag = z)
A plot to visualization and illustrate how the interpolation extends beyond the raw data points. Points in blue:
quakes_interp %>%
ggplot(aes(x = stations, y = depth, z = mag, fill = mag)) +
geom_tile() +
scale_y_reverse(expand = c(0,0)) +
scale_x_continuous(expand = c(0, 0)) +
#geom_vline(data = quakes_sub, aes(xintercept = stations, colour= depth)) +
geom_point(data = quakes_sub, aes(x = stations, y = depth, colour = mag)) +
#stat_contour(aes(fill=..level..), geom="polygon", binwidth=0.005) +
#geom_contour(color = "white", alpha = 0.5) +
#geom_text(data = meta_data, aes(label = Station)) +
scale_fill_distiller(palette="RdYlGn",
na.value="white") +
theme_minimal()
R version 3.4.0 (2017-04-21)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 7 x64 (build 7601) Service Pack 1
Matrix products: default
locale:
[1] LC_COLLATE=English_Canada.1252 LC_CTYPE=English_Canada.1252 LC_MONETARY=English_Canada.1252
[4] LC_NUMERIC=C LC_TIME=English_Canada.1252
attached base packages:
[1] stats graphics grDevices utils datasets methods base
other attached packages:
[1] bindrcpp_0.2 padr_0.3.0 lubridate_1.6.0 akima_0.6-2 readxl_1.0.0 dplyr_0.7.1
[7] purrr_0.2.2.2 readr_1.1.1 tidyr_0.6.3 tibble_1.3.3 ggplot2_2.2.1.9000 tidyverse_1.1.1
loaded via a namespace (and not attached):
[1] reshape2_1.4.2 haven_1.0.0 lattice_0.20-35 colorspace_1.3-2 htmltools_0.3.6 yaml_2.1.14
[7] rlang_0.1.1 foreign_0.8-67 glue_1.1.1 RColorBrewer_1.1-2 sp_1.2-4 modelr_0.1.0
[13] fortunes_1.5-4 bindr_0.1 plyr_1.8.4 stringr_1.2.0 munsell_0.4.3 gtable_0.2.0
[19] cellranger_1.1.0 rvest_0.3.2 psych_1.7.5 evaluate_0.10 labeling_0.3 knitr_1.16
[25] forcats_0.2.0 parallel_3.4.0 broom_0.4.2 Rcpp_0.12.11 scales_0.4.1 backports_1.1.0
[31] jsonlite_1.5 mnormt_1.5-5 hms_0.3 digest_0.6.12 stringi_1.1.5 grid_3.4.0
[37] rprojroot_1.2 tools_3.4.0 magrittr_1.5 lazyeval_0.2.0 pkgconfig_2.0.1 swtext_0.0.1
[43] xml2_1.1.1 assertthat_0.2.0 rmarkdown_1.6 httr_1.2.1 R6_2.2.1 nlme_3.1-131
[49] compiler_3.4.0
Upvotes: 1
Views: 1108
Reputation: 94182
The data you are interpolating from comes from quakes
with stations
less than or equal to 34:
quakes_interp <- quakes %>%
filter(stations <= 34) %>%
do(dpinterp(x = .$stations, y = .$depth, z = .$mag))
Let's just extract those:
> qdata = quakes %>% filter(stations <=34)
And now lets plot the locations that interp
has interpolated at:
> plot(quakes_interp$stations, quakes_interp$depth)
And now add the data points on top:
> points(qdata$stations,qdata$depth,col="red",pch=19)
And for good measure add the convex hull of the original data points:
> lines(
qdata[
chull(qdata$stations, qdata$depth),
c("stations","depth")],lwd=2)
(upside-down compared to yours)
Your question was "Is there a way to constrain an interpolation such that it does not extend beyond the original raw data?" and I seem to have demonstrated that it doesn't, where the definition of "beyond" is the convex hull.
If you did a separate interp
for each station then interp
wouldn't go beyond the min and max value for depth
, because you would have a one-dimensional case (mag~depth) and the convex hull in one dimension is defined by the min and max of the dimension.
interp
is designed for interpolating on continuous coordinates. Your thinking that it shouldn't go beyond the max(depth) for each station is not applicable. Station is, to interp
s eyes, a continuous coordinate like coordinates in space. In most cases with continuous coordinates the max(x)
for any y
is going to be the one value of x
that goes with that y
, because y
, being continuous, has only unique values.
The convex hull is one answer to the question of "what is the region of interest for this set of coordinate points?". Other answers may be "the rectangular bounding box [min(x),max(x),min(y),max(y)]" or "The concave hull alpha shape defined by alpha=0.2", or "the coastline of France". If you want to restrict your region of interest that only extends as far as the max(depth) for each station then you need to clip the output of interp
manually.
But interp
may be totally inappropriate here - if stations
is discrete on integers then the value of an interpolation at stations==3.75
may be meaningless.
Actually, interp
is probably even wronger. It interpolates based on equality of the coordinates, such that a difference of 1 station is the same as a difference of 1 depth unit. This explains the appearance of horizontal stripes in your plot. If you do the plot with an aspect ratio of 1 by adding coord_equal()
to your ggplot
you'll see the actual 2d surface that interp
is working with.
Upvotes: 3