Anthony S.
Anthony S.

Reputation: 391

Using values from a data frame as array indices

I've looked at previous questions on StackOverflow, but haven't found a solution that works for the problem I'm having.

Basically, I have a data frame we'll call df that looks like this:

    source   destination    year    ship    count
         1          1415       1       6        0
         1          1415       2       6        0
         1          1415       3       6        0
         1          1415       4       6        0
         1          1415       5       6        0
         1          1415       6       6        0

Copyable code should you need it here:

df <- structure(list(source = c(1L, 1L, 1L, 1L, 1L, 1L), destination = 
c(1415, 1415, 1415, 1415, 1415, 1415), year = 1:6, ship = c(6, 
6, 6, 6, 6, 6), count = c(0, 0, 0, 0, 0, 0)), .Names = c("source", 
"destination", "year", "ship", "count"), class = "data.frame", 
row.names = c(NA, 6L))

I also have a four dimensional array we'll call m1. Essentially, each of the first four columns of df correspond to each of the four dimensions of m1 - basically, an index. As you can probably guess by now, the fifth column of df corresponds to the value actually stored in m1.

So for example, df$count[3] <- m1[1,1415,3,6].

At the moment, the entire count column is empty and I'd like to fill it in. If it were a small task, I would just do it the slow and stupid way and use a for-loop, but the issue is that df has about 300,000,000 rows, and the dimensions of m1 are around 3900 x 3900 x 35 x 7. As a consequence, the following approach, after running for a full day only got through 5% of the rows:

for(line in 1:nrow(df)){
   print(line/nrow(backcastdf))
   df$count[line] <- m1[df$source[line], df$destination[line], df$year[line], df$ship[line]]
} 

Any ideas on how to do this in a faster way?

Upvotes: 2

Views: 62

Answers (1)

A5C1D2H2I1M1N2O1R2T1
A5C1D2H2I1M1N2O1R2T1

Reputation: 193507

As far as I can tell from your question, you're just looking for matrix indexing.

Consider the following simplified example.

First, your array (with 4 dimensions).

dim1 <- 2; dim2 <- 4; dim3 <- 2; dim4 <- 2
x <- dim1 * dim2 * dim3 * dim4

set.seed(1)
M <- `dim<-`(sample(x), list(dim1, dim2, dim3, dim4))
M
## , , 1, 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]    9   18    6   29
## [2,]   12   27   25   17
## 
## , , 2, 1
## 
##      [,1] [,2] [,3] [,4]
## [1,]   16    5   14   20
## [2,]    2    4    8   32
## 
## , , 1, 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   31   28   24    7
## [2,]   15   11    3   23
## 
## , , 2, 2
## 
##      [,1] [,2] [,3] [,4]
## [1,]   13    1   21   30
## [2,]   19   26   22   10
## 

Second, your data.frame that has the indices of interest.

mydf <- data.frame(source = c(1, 1, 2, 2),
                   destination = c(1, 1, 2, 3),
                   year = c(1, 2, 1, 2),
                   ship = c(1, 1, 2, 1),
                   count = 0)
mydf
##   source destination year ship count
## 1      1           1    1    1     0
## 2      1           1    2    1     0
## 3      2           2    1    2     0
## 4      2           3    2    1     0

Third, extract:

out <- M[as.matrix(mydf[1:4])]
out
# [1]  9 16 11  8

Fourth, compare:

M[1, 1, 1, 1]
# [1] 9
M[1, 1, 2, 1]
# [1] 16
M[2, 2, 1, 2]
# [1] 11
M[2, 3, 2, 1]
# [1] 8

Upvotes: 3

Related Questions