Reputation: 11
I have a data set that looks something like this
ID Q1 Q2 Q3
Person1 A C NA
Person2 B C D
Person3 A C A
Essentially, it's a table of responses to a set of multiple choice questions.
I've been trying to figure out a way to, in R, generate a profile of response for each individual.
The final output would look something like:
A B C D NA
Person1 .33 0 .33 0 .33
Person2 0 .33 .33 .33 0
Person3 .66 0 .33 0 0
I've tried messing around with the crosstab() function and various ways of moving stuff around with dplyr and tidyr. I've also Googled every variation of "R frequency table", to not much success.
Am I missing some really obvious way to do this?
Upvotes: 1
Views: 437
Reputation: 323226
I am using melt
first , then table
+ prop.table
s=reshape2::melt(df,id.vars='ID')
s[is.na(s)]='NA'
prop.table(table(s$ID,as.character(s$value)),1)
A B C D NA
Person1 0.3333333 0.0000000 0.3333333 0.0000000 0.3333333
Person2 0.0000000 0.3333333 0.3333333 0.3333333 0.0000000
Person3 0.6666667 0.0000000 0.3333333 0.0000000 0.0000000
Upvotes: 0
Reputation: 11140
Here's a way with tidyverse
-
df %>%
gather(var, value, -ID) %>%
replace_na(list(value = "Missing")) %>%
count(ID, value) %>%
group_by(ID) %>%
mutate(
prop = n/sum(n)
) %>%
select(-n) %>%
spread(value, prop, fill = 0)
# A tibble: 3 x 6
# Groups: ID [3]
ID A B C D Missing
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Person1 0.333 0 0.333 0 0.333
2 Person2 0 0.333 0.333 0.333 0
3 Person3 0.667 0 0.333 0 0
Data -
df <- read.table(text = "ID Q1 Q2 Q3
Person1 A C NA
Person2 B C D
Person3 A C A", header = T, sep = " ", stringsAsFactors = F)
Upvotes: 1
Reputation: 3236
This is similar to Shree just with annotation for the steps
library(tidyverse)
df <-
tibble(
ID = paste0("Person", 1:3),
Q1 = c("A", "B", "A"),
Q2 = rep("C", 3),
Q3 = c(NA, "D", "A")
)
df %>%
# this will flip the data from wide to long
# and create 2 new columns "var" and "letter"
# using all the columns not = ID
gather(key = var, value = letter, -ID) %>%
# count how many
group_by(ID) %>%
mutate(total = n()) %>%
ungroup() %>%
# groups by ID & letter & counts, creates a column "n"
# can also use a group by
count(ID, letter, total) %>%
# do the math
mutate(pct = round(n/total, 2)) %>%
# keep just these 3 columns
select(ID, letter, pct) %>%
# the inverse of gather(). Will take the letter column to
# make new columns for each unique value and will put the
# pct values underneath them. Any NA will become a 0
spread(key = letter, value = pct, fill = 0)
# ID A B C D `<NA>`
# <chr> <dbl> <dbl> <dbl> <dbl> <dbl>
# Person1 0.33 0 0.33 0 0.33
# Person2 0 0.33 0.33 0.33 0
# Person3 0.67 0 0.33 0 0
Upvotes: 0