Reputation: 463
I have a data table that begins like this:
test <- data.frame("AZ" = c("ABCD, AAAA, BBBB"), "AK" = c("ABCD"), "NJ" = c("BBBC", "AAAA"))
Each column is a state (Arizona, Alaska, New Jersey) and the values within these columns are codes. If there is more than one code in a single cell, it will be separated by a comma.
I would like to create a count table that shows how many of each code shows up for each state. So the table would end up looking like this in the end:
test2 <- data.frame("ABCD" = c("2", "2", "0"), "AAAA" = c("2", "0", "1"), "BBBB" = c("2", "0", "0"), "BBBC" = c("0", "0", "1"), row.names = c("AZ", "AK", "NJ"))
Is there a way to get a table like this to output in R? I've been playing around with dplyr, but no luck so far. Thank you for your help :)
Upvotes: 3
Views: 3529
Reputation: 886938
We can gather
into 'long' format, then split the 'value' by the delimiter ,
with separate_rows
, get the frequency with count
, spread
to 'wide' format
library(tidyverse)
gather(test) %>%
separate_rows(value) %>%
count(key, value) %>%
spread(value, n, fill = 0) %>%
column_to_rownames('key')
# AAAA ABCD BBBB BBBC
#AK 0 2 0 0
#AZ 2 2 2 0
#NJ 1 0 0 1
NOTE: If we need the output in 'long' format, there is no need to spread
gather(test) %>%
separate_rows(value) %>%
count(key, value)
# A tibble: 6 x 3
# key value n
# <chr> <chr> <int>
#1 AK ABCD 2
#2 AZ AAAA 2
#3 AZ ABCD 2
#4 AZ BBBB 2
#5 NJ AAAA 1
#6 NJ BBBC 1
if we also need to group by 'row', create a row_number()
column, then do the gather
, into 'long' format and count
on the unite
d 'key' and 'rn' column
test %>%
mutate(rn = row_number()) %>%
gather(key, val, -rn) %>%
separate_rows(val) %>%
unite(key, key, rn) %>%
count(key, val) %>%
spread(val, n, fill = 0) %>%
column_to_rownames('key')
# AAAA ABCD BBBB BBBC
#AK_1 0 1 0 0
#AK_2 0 1 0 0
#AZ_1 1 1 1 0
#AZ_2 1 1 1 0
#NJ_1 0 0 0 1
#NJ_2 1 0 0 0
Or using base R
table(stack(lapply(test, function(x) unlist(strsplit(as.character(x), ", "))))[2:1])
Upvotes: 5