user5818045
user5818045

Reputation: 13

(Linear) Interpolation in R for data frame (ddply)

I need to interpolate annual data from a 5-year interval and so far I found how to do it for one observation using approx(). But I have a large data set and when trying to use ddply() to apply for each row, no matter what I try in the last row of code I keep receiving error messages.

e.g:

   town <- data.frame(name = c("a","b","c"), X1990 = c(100,300,500), X1995=c(200,400,700))
   d1990 <-c(1990)
   d1995 <-c(1995)
   town_all <- cbind(town,d1990,d1995)


    library(plyr)
    Input <- data.frame(town_all)
    x <- c(town_all$X1990, town_all$X1995)
    y <- c(town_all$d1990, town_all$d1995)
    approx_frame <- function(df) (approx(x=x, y=y, method="linear", n=6, ties="mean"))
    ddply(Input, town_all$X1990, approx_frame)

Also, if you know what function calculates geometric interpolation, it will be great. (I was only able to find examples of spline or constant methods.)

Upvotes: 1

Views: 2302

Answers (1)

Rorschach
Rorschach

Reputation: 32466

I would first put the data in long format (each column corresponds to a variable, so one column for 'year' and one for 'value'). Then, I use data.table, but the same approach could be followed with dplyr or another split-apply-combine method. This interp function is meant to do geometric interpolation with a constant rate calculated for each interval.

## Sample data (added one more year)
towns <- data.frame(name=c('a', 'b', 'c'),
                    x1990=c(100, 300, 500),
                    x1995=c(200, 400, 700),
                    x2000=c(555, 777, 999))

## First, transform data from wide -> long format, clean year column
library(data.table)                                                        # or use reshape2::melt
towns <- melt(as.data.table(towns), id.vars='name', variable.name='year')  # wide -> long
towns[, year := as.integer(sub('[[:alpha:]]', '', year))]                      # convert years to integers

## Function to interpolate at constant rate for each interval
interp <- function(yrs, values) {
    tt <- diff(yrs)               # interval lengths
    N <- head(values, -1L)     
    P <- tail(values, -1L)
    r <- (log(P) - log(N)) / tt   # rate for interval
    const_rate <- function(N, r, time) N*exp(r*(0:(time-1L)))
    list(year=seq.int(min(yrs), max(yrs), by=1L),
         value=c(unlist(Map(const_rate, N, r, tt)), tail(P, 1L)))
}

## geometric interpolation for each town
res <- towns[, interp(year, value), by=name]

## Plot
library(ggplot2)
ggplot(res, aes(year, value, color=name)) +
    geom_line(lwd=1.3) + theme_bw() +
    geom_point(data=towns, cex=2, color='black') +  # add points interpolated between
    scale_color_brewer(palette='Pastel1')

enter image description here

Upvotes: 1

Related Questions