Reputation: 2314
Let me exemplify:
library(data.table)
A <- data.table( value = c(1.3,2.1,2.7), '1' = c(0.4,0.3,0.5), '2' = c(1.1,1.3,1.7) , '3' = c(2.1,2.4,2.6) )
> A
value 1 2 3
1: 1.3 0.4 1.1 2.1
2: 2.1 0.3 1.3 2.4
3: 2.7 0.5 1.7 2.6
I would like to use x = 1,2,3 and y being each row to interpolate the number in the columns "value".
So, for the first row, x = 1,2,3 and y = 0.4, 1.1, 2.1. The value that should be interpolated is x0 = 1.3. And so on for the next rows. I came up with the following function to apply it using a data.table by rows:
## Function to Interpolate
interpol <- function(dt){
# Define X
x <- c(1,2,3)
# Grab each row of the data.table
y <- as.numeric(dt[,c('1','2','3'), with = F])
# Interpolate and get the value of Y
approx(x,y,dt[,'value', with = F])$y
}
# Apply by row
A[, interpol(.SD), by = 1:nrow(A)]
The problem is that this seems to be extremely slow for a data.table of a few million rows. What would be the best way to optimize this?
Originally, my problem was as follows:
I had to interpolate the same A using a different table B:
A2 <- data.table(name = LETTERS[1:3], value = c(1.3,2.1,2.7))
B2 <- data.table(name = LETTERS[1:3], '1' = c(0.4,0.3,0.5), '2' = c(1.1,1.3,1.7) , '3' = c(2.1,2.4,2.6) )
> A2
name value
1: A 1.3
2: B 2.1
3: C 2.7
> B2
name 1 2 3
1: A 0.4 1.1 2.1
2: B 0.3 1.3 2.4
3: C 0.5 1.7 2.6
I decided to merge these two data.tables to obtain the one above because i believed it would be easier. But maybe for it to run faster it might be better to have them as separate data.tables?
Upvotes: 1
Views: 1053
Reputation: 66819
First, I'd suggest storing the B2
data in long format:
dat = melt(B2, id="name", value.name = "v")
dat[, variable := as.numeric(as.character(variable))]
name variable v
1: A 1 0.4
2: B 1 0.3
3: C 1 0.5
4: A 2 1.1
5: B 2 1.3
6: C 2 1.7
7: A 3 2.1
8: B 3 2.4
9: C 3 2.6
From here, you can join on A2
to interpolate by group:
dat[A2, on=.(name), approx(variable, v, value)$y, by=.EACHI]
name V1
1: A 0.61
2: B 1.41
3: C 2.33
Upvotes: 1