Reputation: 391
I've looked at previous questions on StackOverflow, but haven't found a solution that works for the problem I'm having.
Basically, I have a data frame we'll call df
that looks like this:
source destination year ship count
1 1415 1 6 0
1 1415 2 6 0
1 1415 3 6 0
1 1415 4 6 0
1 1415 5 6 0
1 1415 6 6 0
Copyable code should you need it here:
df <- structure(list(source = c(1L, 1L, 1L, 1L, 1L, 1L), destination =
c(1415, 1415, 1415, 1415, 1415, 1415), year = 1:6, ship = c(6,
6, 6, 6, 6, 6), count = c(0, 0, 0, 0, 0, 0)), .Names = c("source",
"destination", "year", "ship", "count"), class = "data.frame",
row.names = c(NA, 6L))
I also have a four dimensional array we'll call m1
. Essentially, each of the first four columns of df
correspond to each of the four dimensions of m1
- basically, an index. As you can probably guess by now, the fifth column of df
corresponds to the value actually stored in m1
.
So for example, df$count[3] <- m1[1,1415,3,6]
.
At the moment, the entire count
column is empty and I'd like to fill it in. If it were a small task, I would just do it the slow and stupid way and use a for-loop, but the issue is that df
has about 300,000,000 rows, and the dimensions of m1
are around 3900 x 3900 x 35 x 7. As a consequence, the following approach, after running for a full day only got through 5% of the rows:
for(line in 1:nrow(df)){
print(line/nrow(backcastdf))
df$count[line] <- m1[df$source[line], df$destination[line], df$year[line], df$ship[line]]
}
Any ideas on how to do this in a faster way?
Upvotes: 2
Views: 62
Reputation: 193507
As far as I can tell from your question, you're just looking for matrix indexing.
Consider the following simplified example.
First, your array
(with 4 dimensions).
dim1 <- 2; dim2 <- 4; dim3 <- 2; dim4 <- 2
x <- dim1 * dim2 * dim3 * dim4
set.seed(1)
M <- `dim<-`(sample(x), list(dim1, dim2, dim3, dim4))
M
## , , 1, 1
##
## [,1] [,2] [,3] [,4]
## [1,] 9 18 6 29
## [2,] 12 27 25 17
##
## , , 2, 1
##
## [,1] [,2] [,3] [,4]
## [1,] 16 5 14 20
## [2,] 2 4 8 32
##
## , , 1, 2
##
## [,1] [,2] [,3] [,4]
## [1,] 31 28 24 7
## [2,] 15 11 3 23
##
## , , 2, 2
##
## [,1] [,2] [,3] [,4]
## [1,] 13 1 21 30
## [2,] 19 26 22 10
##
Second, your data.frame
that has the indices of interest.
mydf <- data.frame(source = c(1, 1, 2, 2),
destination = c(1, 1, 2, 3),
year = c(1, 2, 1, 2),
ship = c(1, 1, 2, 1),
count = 0)
mydf
## source destination year ship count
## 1 1 1 1 1 0
## 2 1 1 2 1 0
## 3 2 2 1 2 0
## 4 2 3 2 1 0
Third, extract:
out <- M[as.matrix(mydf[1:4])]
out
# [1] 9 16 11 8
Fourth, compare:
M[1, 1, 1, 1]
# [1] 9
M[1, 1, 2, 1]
# [1] 16
M[2, 2, 1, 2]
# [1] 11
M[2, 3, 2, 1]
# [1] 8
Upvotes: 3