Reputation: 11016
Assuming the following list:
x <- list(list(q = 1880L, properties = list(), last_Import_Date = "2024-09-16"),
list(q = 1888L, properties = list(list(a = "x", b = "y")), last_Import_Date = "2024-09-16"),
list(q = 1890L, properties = list(list(a = "x", b = "y")), last_Import_Date = "2024-09-16"))
I want to convert this list into a data frame (rowwise). Usually, dplyr::bind_rows
works well. However, my list has some elements that are sometimes empty ("properties"), in which case bind_rows fails in a way that it only keeps those rows that are not empty.
Can someone explain why that is?
And is there any (short) fix for it? I'm currently using rather ugly workarounds using list2DF, then transposing, then converting to data frame, then assigning names.
Wrong results (only keep non-empty properties):
x |>
bind_rows()
# A tibble: 2 × 3
q properties last_Import_Date
<int> <list> <chr>
1 1888 <named list [2]> 2024-09-16
2 1890 <named list [2]> 2024-09-16
UPDATE: where I need some additional help is with unnesting such a special "properties" column. Using unnest_longer
will result in the same "bug" that deletes the NULL row, and using unnest_wider requires some extra workaround for fixing names.
Upvotes: 8
Views: 517
Reputation: 3902
bind_rows
uses vctrs::data_frame
under the hood. It turns out vctrs::data_frame
creates empty dataframe when there is an element with 0 length (i.e. list(0), integer(0), character(0).etc):
vctrs::data_frame(!!!list(q = 1880L, properties = list(), last_Import_Date = "2024-09-16"),.name_repair="unique")
[1] q properties last_Import_Date
<0 rows> (or 0-length row.names)
vctrs::data_frame(a=list("a"),b= integer(0))
[1] a b
<0 rows> (or 0-length row.names)
vctrs::data_frame(a=list(),b= 1)
[1] a b
<0 rows> (or 0-length row.names)
One alternative is to use vctrs::vec_rbind
:
vctrs::vec_rbind(!!!x)
q properties last_Import_Date
1 1880 NULL 2024-09-16
2 1888 x, y 2024-09-16
3 1890 x, y 2024-09-16
Upvotes: 5
Reputation: 270045
1) bind_rows bind_rows
will work if you pre and post process the input like this:
library(dplyr)
x |> lapply(unlist) |> bind_rows() |> type.convert(as.is = TRUE)
## # A tibble: 3 × 4
## q last_Import_Date properties.a properties.b
## <int> <chr> <chr> <chr>
## 1 1880 2024-09-16 <NA> <NA>
## 2 1888 2024-09-16 x y
## 3 1890 2024-09-16 x y
2) transpose Transposing x
and then removing the extra layer of lists in properties
allows us to use hoist
to hoist a
and b
from properties
.
library(purrr)
library(tidyr)
x |>
transpose() |>
list2DF() |>
transform(properties = lapply(properties, unlist)) |>
hoist(properties, "a", "b")
## q a b last_Import_Date
## 1 1880 <NA> <NA> 2024-09-16
## 2 1888 x y 2024-09-16
## 3 1890 x y 2024-09-16
3) Base R If a list column for properties
is sufficient then this double iteration uses only base R:
Map(\(z) sapply(x, "[[", z), names(x[[1]])) |> list2DF()
## q properties last_Import_Date
## 1 1880 NULL 2024-09-16
## 2 1888 x, y 2024-09-16
## 3 1890 x, y 2024-09-16
4) rrapply rrapply
can create the data frame directly:
library(rrapply)
rrapply(x, how = "bind")
## q last_Import_Date properties.1.a properties.1.b
## 1 1880 2024-09-16 <NA> <NA>
## 2 1888 2024-09-16 x y
## 3 1890 2024-09-16 x y
5) Recursive This base R solution is longer than the others but maybe it is of interest anyways. We define getField
which given a list that represents a row finds and returns the value of the input field name (argument field
) or NA if none found. Map
iterates over Names
(q, a, b, last_Date_Modified). It uses sapply
to iterate over the rows for a given name.
getField <- function(x, field) {
ret <- NA
if (is.list(x)) {
if (field %in% names(x)) ret <- x[[field]]
else for(el in x) if (!is.na(ret <- Recall(el, field))) break
}
ret
}
# Names <- c("q", "a", "b", "last_Import_Date")
Names <- sub(".*\\.", "", unique(names(unlist(x))))
Map(\(fld) sapply(x, getField, field = fld), Names) |> list2DF()
## q last_Import_Date a b
## 1 1880 2024-09-16 <NA> <NA>
## 2 1888 2024-09-16 x y
## 3 1890 2024-09-16 x y
Upvotes: 5
Reputation: 102529
If you want to use unnest
without removing empty entries in properties
, you should specify the option keep_empty = TRUE
(based on @one's vec_rbind
approach)
vctrs::vec_rbind(!!!x) %>%
unnest(cols = everything(), keep_empty = TRUE)
which gives
# A tibble: 3 × 3
q properties last_Import_Date
<int> <list> <chr>
1 1880 <NULL> 2024-09-16
2 1888 <named list [2]> 2024-09-16
3 1890 <named list [2]> 2024-09-16
and its base R equivalence might be
list2DF(
lapply(
as.data.frame(do.call(rbind, x)),
\(v) unlist(replace(v, lengths(v) == 0, list(list(NULL))), FALSE)
)
)
which gives
q properties last_Import_Date
1 1880 NULL 2024-09-16
2 1888 x, y 2024-09-16
3 1890 x, y 2024-09-16
and the structure looks like
'data.frame': 3 obs. of 3 variables:
$ q : int 1880 1888 1890
$ properties :List of 3
..$ : NULL
..$ :List of 2
.. ..$ a: chr "x"
.. ..$ b: chr "y"
..$ :List of 2
.. ..$ a: chr "x"
.. ..$ b: chr "y"
$ last_Import_Date: chr "2024-09-16" "2024-09-16" "2024-09-16"
Here is a base R quick fix
> as.data.frame(do.call(rbind, x))
q properties last_Import_Date
1 1880 NULL 2024-09-16
2 1888 x, y 2024-09-16
3 1890 x, y 2024-09-16
and its structure looks like
> as.data.frame(do.call(rbind, x)) %>% str()
'data.frame': 3 obs. of 3 variables:
$ q :List of 3
..$ : int 1880
..$ : int 1888
..$ : int 1890
$ properties :List of 3
..$ : list()
..$ :List of 1
.. ..$ :List of 2
.. .. ..$ a: chr "x"
.. .. ..$ b: chr "y"
..$ :List of 1
.. ..$ :List of 2
.. .. ..$ a: chr "x"
.. .. ..$ b: chr "y"
$ last_Import_Date:List of 3
..$ : chr "2024-09-16"
..$ : chr "2024-09-16"
..$ : chr "2024-09-16"
Upvotes: 4