Lecheng Tong
Lecheng Tong

Reputation: 11

Generate frequency table of levels of multiple categorical variables for all observations in R

I have a data set that looks something like this

ID        Q1 Q2 Q3
Person1   A  C  NA
Person2   B  C  D
Person3   A  C  A

Essentially, it's a table of responses to a set of multiple choice questions.

I've been trying to figure out a way to, in R, generate a profile of response for each individual.

The final output would look something like:

           A    B    C   D   NA
Person1   .33   0  .33   0  .33
Person2    0   .33 .33  .33  0
Person3   .66   0  .33   0   0

I've tried messing around with the crosstab() function and various ways of moving stuff around with dplyr and tidyr. I've also Googled every variation of "R frequency table", to not much success.

Am I missing some really obvious way to do this?

Upvotes: 1

Views: 437

Answers (3)

BENY
BENY

Reputation: 323226

I am using melt first , then table + prop.table

s=reshape2::melt(df,id.vars='ID')
s[is.na(s)]='NA'
prop.table(table(s$ID,as.character(s$value)),1)

                  A         B         C         D        NA
  Person1 0.3333333 0.0000000 0.3333333 0.0000000 0.3333333
  Person2 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000
  Person3 0.6666667 0.0000000 0.3333333 0.0000000 0.0000000

Upvotes: 0

Shree
Shree

Reputation: 11140

Here's a way with tidyverse -

df %>% 
  gather(var, value, -ID) %>% 
  replace_na(list(value = "Missing")) %>% 
  count(ID, value) %>% 
  group_by(ID) %>% 
  mutate(
    prop = n/sum(n)
  ) %>% 
  select(-n) %>% 
  spread(value, prop, fill = 0)

# A tibble: 3 x 6
# Groups:   ID [3]
  ID          A     B     C     D Missing
  <chr>   <dbl> <dbl> <dbl> <dbl>   <dbl>
1 Person1 0.333 0     0.333 0       0.333
2 Person2 0     0.333 0.333 0.333   0    
3 Person3 0.667 0     0.333 0       0   

Data -

df <- read.table(text = "ID Q1 Q2 Q3
Person1 A C NA
Person2 B C D
Person3 A C A", header = T, sep = " ", stringsAsFactors = F)

Upvotes: 1

yake84
yake84

Reputation: 3236

This is similar to Shree just with annotation for the steps

library(tidyverse)

df <-
  tibble(
    ID = paste0("Person", 1:3),
    Q1 = c("A", "B", "A"),
    Q2 = rep("C", 3),
    Q3 = c(NA, "D", "A")
  )

df %>% 
  # this will flip the data from wide to long
  # and create 2 new columns "var" and "letter"
  # using all the columns not = ID
  gather(key = var, value = letter, -ID) %>%
  # count how many 
  group_by(ID) %>% 
  mutate(total = n()) %>% 
  ungroup() %>% 
  # groups by ID & letter & counts, creates a column "n" 
  # can also use a group by
  count(ID, letter, total) %>% 
  # do the math
  mutate(pct = round(n/total, 2)) %>% 
  # keep just these 3 columns
  select(ID, letter, pct) %>% 
  # the inverse of gather(). Will take the letter column to
  # make new columns for each unique value and will put the 
  # pct values underneath them. Any NA will become a 0
  spread(key = letter, value = pct, fill = 0)

#  ID          A     B     C     D `<NA>`
#  <chr>   <dbl> <dbl> <dbl> <dbl>  <dbl>
# Person1  0.33  0     0.33  0      0.33
# Person2  0     0.33  0.33  0.33   0   
# Person3  0.67  0     0.33  0      0   

Upvotes: 0

Related Questions