Reputation: 173
I'm using the reticulate
R package from RStudio to run some python code to bring data from ROOT (http://root.cern.ch) into R. My problem is that the python code returns a list of row-wise lists. For example, in python,
[[0L, 0L, 'mu+', 1, 0, 0, 1, 3231.6421853545253, -17.361063509909364, 6322.884067996471, -2751.857298366544, 1.2318766603937736, 1407.9560948453036, 3092.931322317615],
[0L, 0L, 'nu_e', 3, 1, 0, 0, 3231.6421853545253, -17.361063509909364, 6322.884067996471, -743.6755000649275, 9.950229845741603, 342.4203222294634, 818.781981693865],
[0L, 0L, 'anti_nu_mu', 2, 1, 0, 0, 3231.6421853545253, -17.361063509909364, 6322.884067996471, -808.1114666690765, 21.680955968349267, 445.2784282520303, 922.9231198102832],
...]
These data get turned into a corresponding list of lists in R by reticulate
,
List of 136972
$ :List of 14
..$ : int 0
..$ : int 0
..$ : chr "mu+"
..$ : int 1
..$ : int 0
..$ : int 0
..$ : int 0
..$ : num 7162
..$ : num -0.0108
..$ : num -627
..$ : num 264
..$ : num -3.24
..$ : num 3080
..$ : num 3093
$ :List of 14
..$ : int 0
..$ : int 0
..$ : chr "mu+"
..$ : int 1
.... (you get the idea)
I've searched everywhere I can think of, and I cannot find a way to turn these data into a data frame (I really want a tibble). One problem seems to be that the list entries are not named. There's a lot of data, and so I don't want to do something inefficient. I can have the python code return a dictionary of columns and that will work. But the python code to make a row is so much simpler.
If there was an easy way to turn these row-wise lists into a data frame, that would be ideal. Any ideas?
Upvotes: 1
Views: 1365
Reputation: 193657
Here are a couple of approaches that came to mind:
Option 1: We know how many items are in the sub-lists (how many columns are expected). Cycle through the list to make a new list with each relevant element from the sub-lists. Wrap that in as.data.frame
and you're done.
myFun_1 <- function(inlist, expectedCols = 14) {
as.data.frame(
lapply(sequence(expectedCols),
function(x) {
sapply(inlist, function(y) y[[x]])
}),
col.names = paste0("V", sequence(expectedCols)))
}
Option 2. Use do.call(rbind, .)
and then unlist
each column to make a regular data.frame
with no list
columns.
myFun_2 <- function(inlist) {
x <- as.data.frame(do.call(rbind, inlist))
x[] <- lapply(x, unlist)
x
}
Let's test these out with some sample data. Here's a list
that should create a rectangular 3 row x 14 column dataset:
LL <- list(
list(0L, 0L, 'mu+', 1, 0, 0, 1, 3231.6421853545253, -17.361063509909364,
6322.884067996471, -2751.857298366544, 1.2318766603937736,
1407.9560948453036, 3092.931322317615),
list(0L, 0L, 'nu_e', 3, 1, 0, 0, 3231.6421853545253, -17.361063509909364,
6322.884067996471, -743.6755000649275, 9.950229845741603,
342.4203222294634, 818.781981693865),
list(0L, 0L, 'anti_nu_mu', 2, 1, 0, 0, 3231.6421853545253,
-17.361063509909364, 6322.884067996471, -808.1114666690765,
21.680955968349267, 445.2784282520303, 922.9231198102832))
Here's a bigger version of this, which would create a 150000 row by 14 column dataset.
Big_LL <- unlist(replicate(50000, LL, FALSE), FALSE)
Outcomes of each function on the small dataset:
myFun_1(LL)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## 1 0 0 mu+ 1 0 0 1 3231.642 -17.36106 6322.884 -2751.8573 1.231877
## 2 0 0 nu_e 3 1 0 0 3231.642 -17.36106 6322.884 -743.6755 9.950230
## 3 0 0 anti_nu_mu 2 1 0 0 3231.642 -17.36106 6322.884 -808.1115 21.680956
## V13 V14
## 1 1407.9561 3092.9313
## 2 342.4203 818.7820
## 3 445.2784 922.9231
myFun_2(LL)
## V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12
## 1 0 0 mu+ 1 0 0 1 3231.642 -17.36106 6322.884 -2751.8573 1.231877
## 2 0 0 nu_e 3 1 0 0 3231.642 -17.36106 6322.884 -743.6755 9.950230
## 3 0 0 anti_nu_mu 2 1 0 0 3231.642 -17.36106 6322.884 -808.1115 21.680956
## V13 V14
## 1 1407.9561 3092.9313
## 2 342.4203 818.7820
## 3 445.2784 922.9231
All looking good. Now, how about performance?
system.time(myFun_1(Big_LL))
## user system elapsed
## 2.65 0.05 2.75
system.time(myFun_2(Big_LL))
## user system elapsed
## 0.41 0.00 0.40
So, go with the second approach ;-)
Upvotes: 2