Jan
Jan

Reputation: 5254

Check "emptiness" of list containing empty vectors (which R does not recognise as empty list)

I have a list that is the result of a row selection in a data frame. The issue is that sometimes there is no row to select and it returns a list in this form: a non-empty list with no actual content.

L <- list(combattech = character(0), damage = character(0), bonus = character(0), 
          range = structure(list(close = character(0), medium = character(0), far = character(0)), 
                            row.names = integer(0), class = "data.frame"), 
          ammo = character(0), weight = character(0), name = character(0), 
          price = character(0), sf = character(0))

I want to verify if I actually have a meaningful result and not a list with all elements being empty vectors. But a list with empty vectors is not equivalent to an empty list:

length(L) == 0
#> [1] FALSE

does not give me TRUE because the length is 9 not 0.

Of course, I could simply check if length( which(...row selection...) ) before I pick the selection and usually I do, but in this case I do not have access to the original row indices.

all(sapply(L, length) == 0)
#> [1] FALSE

also does not work (i.e. returns FALSE) because the nested data structure range returns 3.

Created on 2020-06-28 by the reprex package (v0.3.0)

Upvotes: 3

Views: 385

Answers (4)

Jan
Jan

Reputation: 5254

I did some checking and all proposed solutions work either in a positive case (L is empty) …

L0 <- list(combattech = character(0), damage = character(0), bonus = character(0), 
           range = structure(list(close = character(0), medium = character(0), far = character(0)), 
                             row.names = integer(0), class = "data.frame"), 
           ammo = character(0), weight = character(0), name = character(0), price = character(0), sf = character(0))

all(rapply(L0, length) == 0) # Solution 1
#> [1] TRUE
all(sapply(L0, function(x) if(is.data.frame(x)) nrow(x) else length(x)) == 0) # Solution 2
#> [1] TRUE
all(sapply(L0, NROW) == 0) # Solution 3
#> [1] TRUE
length(unlist(L0)) == 0 # Solution 4
#> [1] TRUE
require(purrr)
#> Lade nötiges Paket: purrr
every(L0, ~ NROW(.) == 0) # Solution 5
#> [1] TRUE

… and in the negative case (L has content)

L1 <- list(combattech = "ranged", damage = "1d", bonus = "+3", 
           range = structure(list(close = "20", medium = "40", far = "80"), 
                             row.names = integer(0), class = "data.frame"), 
           ammo = "arrow", weight = "1.5 Stone", name = "Bow", price = "120 silver", sf = "3/5")

all(rapply(L1, length) == 0) # Solution 1
#> [1] FALSE
all(sapply(L1, function(x) if(is.data.frame(x)) nrow(x) else length(x)) == 0) # Solution 2
#> [1] FALSE
all(sapply(L1, NROW) == 0) # Solution 3
#> [1] FALSE
length(unlist(L1)) == 0 # Solution 4
#> [1] FALSE
every(L1, ~ NROW(.) == 0) # Solution 5
#> [1] FALSE

Using NROW directly - however - does not work, even when we coerce L1 into a data frame:

NROW(as.data.frame(L1)) == 0 # Solution 6 only works with empty lists
#> Error in (function (..., row.names = NULL, check.rows = FALSE, check.names = TRUE, : Argumente implizieren unterschiedliche Anzahl Zeilen: 1, 0

I wanted to decide on an approach based on their performance, using both cases a positive and negative example.

require(microbenchmark)
#> Lade nötiges Paket: microbenchmark
L40 <- list(combattech = rep("ranged", 40), damage = rep(paste0(1:2, "d"), each = 20), bonus = paste0("+", 1:40), 
            range = structure(list(close = "20", medium = "40", far = "80"), row.names = integer(0), class = "data.frame"), 
           ammo = rep(c("arrow", "bolt"), 20), weight = paste0(0.5*1:40, " Stone"), name = rep(c("bow", "crossbow"), 20), price = paste(seq(10, 10*40, 10), "silver"), sf = rep("3/5", 40))
microbenchmark(
  unlist   = {length(unlist(L0)) == 0; length(unlist(L1)) == 0; length(unlist(L40)) == 0},
  rapply   = {all(rapply(L0, length) == 0); all(rapply(L1, length) == 0); all(rapply(L40, length) == 0)},
  NROW     = {all(sapply(L0, NROW) == 0); all(sapply(L0, NROW) == 0); all(sapply(L40, NROW) == 0)},
  long.one = {all(sapply(L0, function(x) if(is.data.frame(x)) nrow(x) else length(x)) == 0); all(sapply(L1, function(x) if(is.data.frame(x)) nrow(x) else length(x)) == 0); all(sapply(L40, function(x) if(is.data.frame(x)) nrow(x) else length(x)) == 0)},
  purrr    = {every(L0, ~ NROW(.) == 0); every(L1, ~ NROW(.) == 0); every(L40, ~ NROW(.) == 0)},
  times = 5E3)
#> Unit: microseconds
#>      expr  min    lq      mean median     uq    max neval
#>    unlist 81.5  83.4  84.68564   84.2  84.90 1365.7  5000
#>    rapply 27.9  31.9  36.44792   34.1  35.60 6015.9  5000
#>      NROW 51.3  56.0  60.63962   58.0  60.30 1657.4  5000
#>  long.one 61.1  67.2  72.01368   69.4  71.90 3727.1  5000
#>     purrr 97.7 108.2 116.74834  111.6 114.95 1917.5  5000

I am glad that I finally added an example with 40 rows. With only 1 row (as in L1) the unlist approach showed best performance, by far. But with 40 rows the situation has changed.

So, the final recommendation is:

  • Use the unlist approach when you expect a small number of rows (or none).
  • Use rapply if the list usually contains a larger number of rows and you want to filter out occasional empty lists.

Created on 2020-06-28 by the reprex package (v0.3.0)

Upvotes: 0

tmfmnk
tmfmnk

Reputation: 39858

One purrr solution using the basic logic provided by @user20650 and @Ronak Shah:

every(L, ~ NROW(.) == 0)

[1] TRUE

Upvotes: 1

G. Grothendieck
G. Grothendieck

Reputation: 269491

1) We can use rapply to recursively walk the structure and return a flat result.

all(rapply(L, length) == 0)
## [1] TRUE

2) Another approach is to unlist it first:

length(unlist(L)) == 0
## [1] TRUE

Upvotes: 3

Ronak Shah
Ronak Shah

Reputation: 388907

You can check if the element in the list is a dataframe and return it's row :

all(sapply(L, function(x) if(is.data.frame(x)) nrow(x) else length(x)) == 0)
#[1] TRUE

We can use NROW as suggested by @user20650 which makes this compact.

all(sapply(L, NROW) == 0)

Upvotes: 3

Related Questions