dleal
dleal

Reputation: 2314

R: Interpolate a data.table when x's are the names of columns and y's the rows

Let me exemplify:

library(data.table)
A <- data.table( value = c(1.3,2.1,2.7), '1' = c(0.4,0.3,0.5), '2' = c(1.1,1.3,1.7) , '3' = c(2.1,2.4,2.6) )

> A
   value   1   2   3
1:   1.3 0.4 1.1 2.1
2:   2.1 0.3 1.3 2.4
3:   2.7 0.5 1.7 2.6

I would like to use x = 1,2,3 and y being each row to interpolate the number in the columns "value".

So, for the first row, x = 1,2,3 and y = 0.4, 1.1, 2.1. The value that should be interpolated is x0 = 1.3. And so on for the next rows. I came up with the following function to apply it using a data.table by rows:

## Function to Interpolate
interpol <- function(dt){
  # Define X
  x <- c(1,2,3)
  # Grab each row of the data.table
  y <- as.numeric(dt[,c('1','2','3'), with = F])
  # Interpolate and get the value of Y
  approx(x,y,dt[,'value', with = F])$y
}

# Apply by row
A[, interpol(.SD), by = 1:nrow(A)]

The problem is that this seems to be extremely slow for a data.table of a few million rows. What would be the best way to optimize this?

Side Note

Originally, my problem was as follows:

I had to interpolate the same A using a different table B:

A2 <- data.table(name = LETTERS[1:3], value = c(1.3,2.1,2.7))
B2 <- data.table(name = LETTERS[1:3], '1' = c(0.4,0.3,0.5), '2' = c(1.1,1.3,1.7) , '3' = c(2.1,2.4,2.6) )

> A2
   name value
1:    A   1.3
2:    B   2.1
3:    C   2.7

> B2
   name   1   2   3
1:    A 0.4 1.1 2.1
2:    B 0.3 1.3 2.4
3:    C 0.5 1.7 2.6

I decided to merge these two data.tables to obtain the one above because i believed it would be easier. But maybe for it to run faster it might be better to have them as separate data.tables?

Upvotes: 1

Views: 1053

Answers (1)

Frank
Frank

Reputation: 66819

First, I'd suggest storing the B2 data in long format:

dat = melt(B2, id="name", value.name = "v")
dat[, variable := as.numeric(as.character(variable))]

   name variable   v
1:    A        1 0.4
2:    B        1 0.3
3:    C        1 0.5
4:    A        2 1.1
5:    B        2 1.3
6:    C        2 1.7
7:    A        3 2.1
8:    B        3 2.4
9:    C        3 2.6

From here, you can join on A2 to interpolate by group:

dat[A2, on=.(name), approx(variable, v, value)$y, by=.EACHI]

   name   V1
1:    A 0.61
2:    B 1.41
3:    C 2.33

Upvotes: 1

Related Questions