Luis
Luis

Reputation: 1584

To create a frequency table with dplyr to count the factor levels and missing values and report it

Some questions are similar to this topic (here or here, as an example) and I know one solution that works, but I want a more elegant response.

I work in epidemiology and I have variables 1 and 0 (or NA). Example: Does patient has cancer? NA or 0 is no

1 is yes

Let's say I have several variables in my dataset and I want to count only variables with "1". Its a classical frequency table, but dplyr are turning things more complicated than I could imagine at the first glance.

My code is working:

dataset %>%
  select(VISimpair, HEARimpai, IntDis, PhyDis, EmBehDis, LearnDis, 
         ComDis, ASD, HealthImpair, DevDelays) %>%  # replace to your needs
  summarise_all(funs(sum(1-is.na(.))))

And you can reproduce this code here:

library(tidyverse)
dataset <- data.frame(var1 = rep(c(NA,1),100), var2=rep(c(NA,1),100))

dataset %>% select(var1, var2) %>% summarise_all(funs(sum(1-is.na(.))))

But I really want to select all variables I want, count how many 0 (or NA) I have and how many 1 I have and report it and have this output  desired output

Thanks.

Upvotes: 1

Views: 2391

Answers (2)

David
David

Reputation: 311

What about the following frequency table per variable?

First, I edit your sample data to also include 0's and load the necessary libraries.

library(tidyr)
library(dplyr)
dataset <- data.frame(var1 = rep(c(NA,1,0),100), var2=rep(c(NA,1,0),100))

Second, I convert the data using gather to make it easier to group_by later for the frequency table created by count, as mentioned by CPak.

dataset %>%
    select(var1, var2) %>%
    gather(var, val) %>%
    mutate(val = factor(val)) %>%
    group_by(var, val) %>%
    count()

# A tibble: 6 x 3
# Groups:   var, val [6]
  var   val       n
  <chr> <fct> <int>
1 var1  0       100
2 var1  1       100
3 var1  NA      100
4 var2  0       100
5 var2  1       100
6 var2  NA      100

Upvotes: 2

bala83
bala83

Reputation: 443

A quick and dirty method to do this is to coerce your input into factors:

dataset$var1 = as.factor(dataset$var1) dataset$var2 = as.factor(dataset$var2) summary(dataset$var1) summary(dataset$var2) Summary tells you number of occurrences of each levels of factor.

Upvotes: 0

Related Questions