Matt
Matt

Reputation: 994

How to combine frequency tables with missing values?

I have the following list of tables:

    list(structure(c(`0` = 19L, `1` = 2L, `3` = 43L), .Dim = 3L, .Dimnames = structure(list(
    c("0", "1", "3")), .Names = ""), class = "table"), structure(c(`0` = 7L, 
`1` = 9L, `2` = 5L, `3` = 43L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 14L, 
`1` = 2L, `2` = 4L, `3` = 44L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 21L, 
`1` = 8L, `2` = 2L, `3` = 33L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 23L, 
`1` = 3L, `2` = 1L, `3` = 37L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 19L, 
`1` = 2L, `2` = 4L, `3` = 39L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"), structure(c(`0` = 22L, 
`1` = 1L, `2` = 4L, `3` = 37L), .Dim = 4L, .Dimnames = structure(list(
    c("0", "1", "2", "3")), .Names = ""), class = "table"))

Each table is a tally of observations of values 0, 1, 2, or 3. However, not all values are represented in all tables, so some of the tables have missing columns. I would like those missing values to be assigned 0 in the final output.

merge doesn't work well on lists, and lapplying rbind doesn't work because not all the tables have matching columns.

How can I combine these tables into a single matrix or data.frame with one column for each value (0, 1, 2, 3) and one row for each count (7 in this example)?

The final output should look like this:

structure(list(`0` = c(19L, 7L, 14L, 21L, 23L, 19L, 22L), `1` = c(2L, 
9L, 2L, 8L, 3L, 2L, 1L), `2` = c(0L, 5L, 4L, 2L, 1L, 4L, 4L), 
    `3` = c(43L, 43L, 44L, 33L, 37L, 39L, 37L)), class = "data.frame", row.names = c(NA, 
-7L))

Upvotes: 3

Views: 165

Answers (2)

Anders Ellern Bilgrau
Anders Ellern Bilgrau

Reputation: 10233

In base R and assuming your list is named mylist, then you can do something like the following.

all_names <- sort(unique(unlist(lapply(mylist, names))))

res <- do.call("rbind", lapply(mylist, function(x) x[all_names]))
print(res)
#      0 1 <NA>  3
#[1,] 19 2   NA 43
#[2,]  7 9    5 43
#[3,] 14 2    4 44
#[4,] 21 8    2 33
#[5,] 23 3    1 37
#[6,] 19 2    4 39
#[7,] 22 1    4 37

Now, you can either accept that or do a few edits to get it perfect:

colnames(res) <- all_names  # Ensure correct colnames
res[is.na(res)] <- 0        # Overwrite NAs with 0
print(res)
#      0 1 2  3
#[1,] 19 2 0 43
#[2,]  7 9 5 43
#[3,] 14 2 4 44
#[4,] 21 8 2 33
#[5,] 23 3 1 37
#[6,] 19 2 4 39
#[7,] 22 1 4 37

Upvotes: 2

akrun
akrun

Reputation: 887691

We convert the individual datasets to data.frame with map and use bind_rows to row bind the datasets to a single dataset

library(tidyverse)
map(lst, as.data.frame.list, check.names = FALSE) %>% 
          bind_rows

Upvotes: 1

Related Questions