Eli
Eli

Reputation: 290

How can I convert data frame of survey responses to a frequency table?

I have an R dataframe of survey results. Each column is a response to a question on the survey. It can take values 1 to 10 and NA. I would like turn this into a frequency table.

This is an example of the data I have. I'm pretending the values go from 1 to 3, instead of 1 to 10.

data.frame(
  "Person" = c(1,2,3),
  "Question1" = c(NA, "1", "1"),
  "Question2" = c("1", "2", "3")
)

What I want:

data.frame(
  "Question" = c("Question1", "Question2"),
  "Frequency of 1" = c(2, 1),
  "Frequency of 2" = c(0 , 1),
  "Frequency of 3" = c(0, 1)
)

I have tried using likert() from the likert package, but I'm getting fractional results which cannot be correct. Is there a simple solution to this problem?

Upvotes: 3

Views: 646

Answers (4)

Khaynes
Khaynes

Reputation: 1986

A data.table solution:

require(data.table)
setDT(df)    

# Melt data:
df <- melt(df, id.vars = "Person", value.name = "Question")

# Cast data to required structure:
df <- data.frame(dcast(df, variable ~ Question))

# Rename variables and remove NA count (as per Ops question):
names(df)[1] <- "Question"
names(df)[-1] <- gsub("X", "Frequency of ", names(df)[-1])
df$NA. <- NULL

df
#   Question Frequency of 1 Frequency of 2 Frequency of 3
#1 Question1              2              0              0
#2 Question2              1              1              1

Or a one line answer:

dcast(melt(setDT(df), id.vars="Person", value.name="Question")[!Question %in% NA][, Question := paste0("Frequency of ", Question)], variable ~ Question)

Upvotes: 2

tmfmnk
tmfmnk

Reputation: 39858

A different tidyverse possibility could be:

df %>%
 gather(Question, val, -Person, na.rm = TRUE) %>%
 group_by(Question, val) %>%
 summarise(res = length(val)) %>%
 ungroup() %>%
 mutate(val = paste0("Frequency.of.", val)) %>%
 spread(val, res, fill = NA)

  Question  Frequency.of.1 Frequency.of.2 Frequency.of.3
  <chr>              <int>          <int>          <int>
1 Question1              2             NA             NA
2 Question2              1              1              1

Here it, first, transforms the data from wide to long format. Second, it calculates the frequencies according the questions. Finally, it creates the "Frequency.of." variables and returns the data to its desired shape.

Or if you want to calculate also the NA values per questions:

df %>%
 gather(Question, val, -Person) %>%
 group_by(Question, val) %>%
 summarise(res = length(val)) %>%
 ungroup() %>%
 mutate(val = paste0("Frequency.of.", val)) %>%
 spread(val, res, fill = NA)

  Question  Frequency.of.1 Frequency.of.2 Frequency.of.3 Frequency.of.NA
  <chr>              <int>          <int>          <int>           <int>
1 Question1              2             NA             NA               1
2 Question2              1              1              1              NA

Upvotes: 1

Adam Gruer
Adam Gruer

Reputation: 126

Here is a solution using the dplyr and purrr packages

library(dplyr)
library(purrr)

data.frame(
  "Person" = c(1,2,3),
  "Question1" = c(NA, "1", "1"),
  "Question2" = c("1", "2", "3")
)

df %>% 
  select(-Person) %>% 
  mutate_all(~ factor(.x, levels =  as.character(1:10) ) %>% addNA() ) %>% 
  map(table) %>% 
  transpose() %>% 
  map(as.integer) %>% 
  set_names( ~ paste0("Frequency of ",ifelse(is.na(.), "NA", .))) %>% 
  as_tibble() %>% 
  mutate(Question = setdiff(names(df),"Person")) %>% 
  select(Question,everything(), "Frequency of NA" = `Frequency of ` ) 

Upvotes: 3

NelsonGon
NelsonGon

Reputation: 13309

This is not the most elegant but might help: df2 is your data set. Data:

   df2<-data.frame(
  "Person" = c(1,2,3),
  "Question1" = c(NA, "1", "1"),
  "Question2" = c("1", "2", "3"),stringsAsFactors = F
)

Target: EDIT:: You could "automate" as follows

df2[is.na(df2)]<-0 #To allow numeric manipulation
values<-c("1","2","3")
    Final_df<-sapply(values,function(val) apply(df2[,-1],2,function(x) sum(x==val)))
    Final_df<-as.data.frame(Final_df)
    names(Final_df)<-paste0("Frequency of_",1:ncol(Final_df))

This yields:

             Frequency of_1          Frequency of_2          Frequency of_3
Question1              2                0                    0
Question2              1                1                    1

Upvotes: 0

Related Questions