Reputation: 3689
A simple data.frame
with character
columns:
df <- data.frame(x = c("a", "b", "c", "c"), y = c("a", "b", "b", "c"))
Suppose I wish to count the categories at each column, and fast, returning another data.frame
. The following using map
from purrr
is elegant and works:
df %>%
map(table) %>%
Reduce(cbind, .) %>%
data.frame() %>%
set_names(c("x", "y"))
x y
a 1 1
b 1 2
c 2 1
HOWEVER. What to do when not all categories appear in each column? Example:
df2 <- data.frame(x = c("a", "b", "b"), y = c("a", "a", "a"))
I would want the count for b
in the y
column to be 0
. But I get:
df2 %>%
map(table) %>%
Reduce(cbind, .) %>%
data.frame() %>%
set_names(c("x", "y"))
x y
a 1 3
b 2 3
Without even a warning! I'm guessing this is because of cbind
's habbit of recycling elements of one column to match the length of another. I tried using qpcR:::cbind.na
to at least get NA
values for the missing categories which I can later convert to 0
but I get this error:
Error in matrix(, maxRow - nrow(x), ncol(x)) :
invalid 'ncol' value (too large or NA)
What's a great, fast solution, preferably from the tidyverse
set of packages?
UPDATE:
For the first case where we know all categories are in all columns:
df %>% dmap(function(x) as.numeric(table(x)))
is probably much more elegant.
Upvotes: 1
Views: 692
Reputation: 78600
You can use gather()
and spread()
from tidyr with dplyr's count()
in the middle.
library(dplyr)
library(tidyr)
df2 <- data_frame(x = c("a", "b", "b"), y = c("a", "a", "a"))
df2 %>%
gather(key, value) %>%
count(key, value) %>%
spread(key, n, fill = 0)
Result:
value x y
* <chr> <dbl> <dbl>
1 a 1 3
2 b 2 0
The fill = 0
in spread()
is what causes the b/y pair to be 0.
Upvotes: 1