Sarah
Sarah

Reputation: 463

Creating a count table based on each value in each column in R

I have a data table that begins like this:

    test <- data.frame("AZ" = c("ABCD, AAAA, BBBB"), "AK" = c("ABCD"), "NJ" = c("BBBC", "AAAA"))

Each column is a state (Arizona, Alaska, New Jersey) and the values within these columns are codes. If there is more than one code in a single cell, it will be separated by a comma.

I would like to create a count table that shows how many of each code shows up for each state. So the table would end up looking like this in the end:

    test2 <- data.frame("ABCD" = c("2", "2", "0"), "AAAA" = c("2", "0", "1"), "BBBB" = c("2", "0", "0"), "BBBC" = c("0", "0", "1"), row.names = c("AZ", "AK", "NJ"))

Is there a way to get a table like this to output in R? I've been playing around with dplyr, but no luck so far. Thank you for your help :)

Upvotes: 3

Views: 3529

Answers (1)

akrun
akrun

Reputation: 886938

We can gather into 'long' format, then split the 'value' by the delimiter , with separate_rows, get the frequency with count, spread to 'wide' format

library(tidyverse)
gather(test) %>% 
   separate_rows(value) %>% 
   count(key, value) %>%
   spread(value, n, fill = 0) %>%
   column_to_rownames('key')
#   AAAA ABCD BBBB BBBC
#AK    0    2    0    0
#AZ    2    2    2    0
#NJ    1    0    0    1

NOTE: If we need the output in 'long' format, there is no need to spread

gather(test) %>% 
    separate_rows(value) %>% 
    count(key, value)
# A tibble: 6 x 3
#  key   value     n
#  <chr> <chr> <int>
#1 AK    ABCD      2
#2 AZ    AAAA      2
#3 AZ    ABCD      2
#4 AZ    BBBB      2
#5 NJ    AAAA      1
#6 NJ    BBBC      1

Update

if we also need to group by 'row', create a row_number() column, then do the gather, into 'long' format and count on the united 'key' and 'rn' column

test %>%
   mutate(rn = row_number()) %>% 
   gather(key, val, -rn) %>%
   separate_rows(val) %>% 
   unite(key, key, rn) %>% 
   count(key, val) %>% 
   spread(val, n, fill = 0) %>%
   column_to_rownames('key')
#      AAAA ABCD BBBB BBBC
#AK_1    0    1    0    0
#AK_2    0    1    0    0
#AZ_1    1    1    1    0
#AZ_2    1    1    1    0
#NJ_1    0    0    0    1
#NJ_2    1    0    0    0

Or using base R

table(stack(lapply(test, function(x) unlist(strsplit(as.character(x), ", "))))[2:1])

Upvotes: 5

Related Questions