Reputation: 69
I have the following list, and I want to create a data.frame that holds every possible "path", when the numeric value of the arrays is> 0.
This is the list:
> ABBCCD2
$A1
$A1$B1
D1 D2
C1 0.233 0.078
C2 0.039 0.039
$A1$B2
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A1$B3
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A2
$A2$B1
D1 D2
C1 0.100 0.033
C2 0.017 0.017
$A2$B2
D1 D2
C1 0 0
C2 0 0
$A2$B3
D1 D2
C1 0 0
C2 0 0
And this is the result I want:
> res
FUN INTC INTB INME prob
1 A1 B1 C1 D1 0.233
2 A1 B1 C1 D2 0.078
3 A1 B1 C2 D1 0.039
4 A1 B1 C2 D2 0.039
5 A1 B2 C1 D1 0.083
6 A1 B2 C1 D2 0.028
7 A1 B2 C2 D1 0.056
8 A1 B2 C2 D2 0.056
9 A1 B3 C1 D1 0.083
10 A1 B3 C1 D2 0.028
11 A1 B3 C2 D1 0.056
12 A1 B3 C2 D2 0.056
13 A2 B1 C1 D1 0.100
14 A2 B1 C1 D2 0.033
15 A2 B1 C2 D1 0.017
16 A2 B1 C2 D2 0.017
I have solved it with for loops, but it is not efficient, since the real problem that I have there are 15 million possible paths, and it can take several days to solve it. This is the code I have made:
m <- 0
# creamos dataframe vacio
res <- data.frame(FUN=character(),INTC=character(),INTB=character(),INME=character(),prob=numeric())
for(i in 1:length(ABBCCD2)) { # A
for (j in 1:length(ABBCCD2[[1]])) { # B
for(k in 1:nrow(ABBCCD2[[1]][[1]])) { # C
for(f in 1:ncol(ABBCCD2[[1]][[1]])) { # D
# solo guardamos las prob > 0
if(ABBCCD2[[i]][[j]][k,f] > 0) {
# contador de caminos con probabilidad no-cero
m <- m + 1
# creamos la fila del data frame correspondiente y vamos rellenando
res[m,] <- data.frame(FUN=names(ABBCCD2[i]), INTC=names(ABBCCD2[[i]][j]), INTB=rownames(ABBCCD2[[i]][[j]])[k],
INME = colnames(ABBCCD2[[i]][[j]])[f] , prob = ABBCCD2[[i]][[j]][k,f] )
}else{
}
}
}
}
}
Any ideas to solve it more efficiently?
Thank you all
Upvotes: 3
Views: 114
Reputation: 42544
If I understand correctly, the challenges are
(Execution not necessarily in that order)
An additional challenge was that the question shows the nested list in printed form but not in a reproducible form, e.g., dput()
. See the Data section on turning the printout into a list structure.
For the sake of completeness, here are two other approaches.
lapply()
and rbindlist()
rrapply::rrapply()
and reshape2::melt()
lapply()
and rbindlist()
library(data.table)
library(magrittr)
res <- lapply(
ABBCCD2,
function(x) lapply(x, as.data.table, keep.rownames = "INTB") %>% rbindlist(idcol = "INTC")
) %>%
rbindlist(idcol = "FUN") %>%
melt(measure.vars = patterns("^D"), variable.name = "INME", value.name = "prob") %>%
.[prob != 0] %>%
setorderv(names(.))
res
FUN INTC INTB INME prob 1: A1 B1 C1 D1 0.233 2: A1 B1 C1 D2 0.078 3: A1 B1 C2 D1 0.039 4: A1 B1 C2 D2 0.039 5: A1 B2 C1 D1 0.083 6: A1 B2 C1 D2 0.028 7: A1 B2 C2 D1 0.056 8: A1 B2 C2 D2 0.056 9: A1 B3 C1 D1 0.083 10: A1 B3 C1 D2 0.028 11: A1 B3 C2 D1 0.056 12: A1 B3 C2 D2 0.056 13: A2 B1 C1 D1 0.100 14: A2 B1 C1 D2 0.033 15: A2 B1 C2 D1 0.017 16: A2 B1 C2 D2 0.017
magrittr
piping is used to improve readability.
This approach converts the single 2 x 2 matrices into data.tables with 3 columns and 2 rows each. These are then combined by rbindlist()
in two steps to form one large data.table. Finally, the two value columns are reshaped to long format and zero prob
values are removed.
setorderv()
is only used to allow for a direct comparison with OP's expected result.
Caveat: Zero prob
values are removed after all data have been turned into long format. This may lead to unexpected results in case one of the matrices contains a zero element just by chance.
melt()
Here is a different approach which first reshapes the matrices to long form data.tables (after excluding matrices with all zero elements) which are then combined into one large dataset by two rbindlist()
steps:
library(data.table)
library(magrittr)
library(rrapply)
res2 <- rrapply(ABBCCD2,
condition = function(x) sum(abs(x)) > 0,
f = function(x) reshape2::melt(x, value.name = "prob"),
classes = "matrix", how = "prune") %>%
lapply(rbindlist, idcol = "INTC") %>%
rbindlist(idcol = "FUN") %>%
setnames(c("Var1", "Var2"), c("INTB", "INME"))%>%
setorderv(names(.))
res2
The result is the same as above.
Here is a way to turn the printout into a nested list structure:
txt <- "$A1
$A1$B1
D1 D2
C1 0.233 0.078
C2 0.039 0.039
$A1$B2
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A1$B3
D1 D2
C1 0.083 0.028
C2 0.056 0.056
$A2
$A2$B1
D1 D2
C1 0.100 0.033
C2 0.017 0.017
$A2$B2
D1 D2
C1 0 0
C2 0 0
$A2$B3
D1 D2
C1 0 0
C2 0 0"
txt
contains the printout as copied and pasted from the question
library(data.table)
library(magrittr)
library(rrapply)
ABBCCD2 <- fread(text = txt, sep = NULL, header = FALSE, blank.lines.skip = TRUE) %>%
.[, tstrsplit(V1, "\\$")] %>%
.[, c("V2", "V3") := zoo::na.locf(.SD, na.rm = FALSE), .SDcols = c("V2", "V3")] %>%
.[V1 != ""] %>%
split(by = c("V2", "V3"), flatten = FALSE, keep.by = FALSE) %>%
rrapply(
f = . %>%
.[, paste0(V1, collapse = "\n") %>%
{paste("rn", .)} %>%
fread() %>%
as.matrix(rownames = "rn")]
, classes = "data.frame", how = "replace")
ABBCCD2
$A1 $A1$B1 D1 D2 C1 0.233 0.078 C2 0.039 0.039 $A1$B2 D1 D2 C1 0.083 0.028 C2 0.056 0.056 $A1$B3 D1 D2 C1 0.083 0.028 C2 0.056 0.056 $A2 $A2$B1 D1 D2 C1 0.100 0.033 C2 0.017 0.017 $A2$B2 D1 D2 C1 0 0 C2 0 0 $A2$B3 D1 D2 C1 0 0 C2 0 0
Upvotes: 3
Reputation: 101335
Here is a base R option using stack
rev(
transform(
stack(df <- as.data.frame(
rapply(ABBCCD2,
t,
how = "replace"
)
)),
ind = paste0(ind, ".", row.names(df))
)
)
which gives
ind values
1 A1.B1.C1.D1 -1.0978872
2 A1.B1.C1.D2 0.1646925
3 A1.B1.C2.D1 -0.8782714
4 A1.B1.C2.D2 1.2239280
5 A1.B2.C1.D1 0.2088934
6 A1.B2.C1.D2 -1.6247005
7 A1.B2.C2.D1 0.2191693
8 A1.B2.C2.D2 -0.4496129
9 A2.B1.C1.D1 0.3426282
10 A2.B1.C1.D2 1.8424623
11 A2.B1.C2.D1 -1.0963979
12 A2.B1.C2.D2 -0.2248845
13 A2.B2.C1.D1 -0.9655256
14 A2.B2.C1.D2 -1.2647063
15 A2.B2.C2.D1 0.6998366
16 A2.B2.C2.D2 0.4514344
17 A2.B3.C1.D1 0.0000000
18 A2.B3.C1.D2 0.0000000
19 A2.B3.C2.D1 0.0000000
20 A2.B3.C2.D2 0.0000000
Upvotes: 1
Reputation: 887108
Here is an option
library(rrapply)
library(purrr)
library(dplyr)
library(tidyr)
map_depth(ABBCCD2, 2, ~ as.data.frame.table(.x)) %>%
map_dfr(~ bind_rows(.x, .id = 'INTC'), .id = 'FUN') %>%
rename_at(3:5, ~c("INTB", "INME", "prob")) %>%
filter(prob != 0)
-output
# FUN INTC INTB INME prob
#1 A1 B1 C1 D1 -1.0978872
#2 A1 B1 C2 D1 -0.8782714
#3 A1 B1 C1 D2 0.1646925
#4 A1 B1 C2 D2 1.2239280
#5 A1 B2 C1 D1 0.2088934
#6 A1 B2 C2 D1 0.2191693
#7 A1 B2 C1 D2 -1.6247005
#8 A1 B2 C2 D2 -0.4496129
#9 A2 B1 C1 D1 0.3426282
#10 A2 B1 C2 D1 -1.0963979
#11 A2 B1 C1 D2 1.8424623
#12 A2 B1 C2 D2 -0.2248845
#13 A2 B2 C1 D1 -0.9655256
#14 A2 B2 C2 D1 0.6998366
#15 A2 B2 C1 D2 -1.2647063
#16 A2 B2 C2 D2 0.4514344
ABBCCD2 <- list(A1 = list(B1 = structure(c(-1.0978871935389, -0.878271447742256,
0.164692499183084, 1.22392804082201), .Dim = c(2L, 2L), .Dimnames = list(
c("C1", "C2"), c("D1", "D2"))), B2 = structure(c(0.208893448902667,
0.21916929248291, -1.62470051990683, -0.449612869059051), .Dim = c(2L,
2L), .Dimnames = list(c("C1", "C2"), c("D1", "D2")))), A2 = list(
B1 = structure(c(0.34262819072166, -1.09639792471103, 1.8424623311698,
-0.224884516346163), .Dim = c(2L, 2L), .Dimnames = list(c("C1",
"C2"), c("D1", "D2"))), B2 = structure(c(-0.965525564286861,
0.699836580462635, -1.26470634026811, 0.451434438203962), .Dim = c(2L,
2L), .Dimnames = list(c("C1", "C2"), c("D1", "D2"))), B3 = structure(c(0,
0, 0, 0), .Dim = c(2L, 2L), .Dimnames = list(c("C1", "C2"
), c("D1", "D2")))))
Upvotes: 3