Reputation: 1
I want to get percentages of categorical answer types for different types of questions (TYPE). I have multiple responses for each type for each individual, with multiple, categorical responses (different levels).
1) each individual should be on a different row, and
2) the columns should be the TYPES+Response Level, with the value being percentage of times that particular response level was given for that question type for that individual.
The DATA looks like this:
SUBJECT TYPE RESPONSE
John a kappa
John b gamma
John a delta
John a gamma
Mary a kappa
Mary a delta
Mary b kappa
Mary a gamma
Bill b delta
Bill a gamma
The result should look like this:
SUBJECT a-kappa a-gamma a-delta b-kappa b-gamma b-delta
John 0.33 0.33 0.33 1.00 1.00 0.00
Mary 0.66 0.33 0.00 1.00 0.00 0.00
Bill 1.00 0.00 0.00 0.00 0.00 1.00
Based on c1au61o_HH's answer I was able to create something that works for my actual data file, but will still need some post-processing. (It is also not very elegant, but that's a minor concern.)
Finaldf <- mydata %>%
group_by(Subject,Type) %>%
mutate(TOT = n()) %>%
group_by(Subject, Response, Type) %>%
mutate(RESPTOT = n())
Finaldf <- distinct(Finaldf)
Finaldf$Percentage <- Finaldf$RESPTOT/Finaldf$TOT
Any help is much appreciated, also please with some explanation.
Upvotes: 0
Views: 61
Reputation: 897
Probably this is not the most efficient way, but if you want to use tidyverse
you can unite the 2 columns and then do 2 different group_by
to calculate totals for each subjects and percents.
library(tidyverse)
df %>%
unite(TYPE_RESPONSE, c("TYPE", "RESPONSE"), sep = "_") %>%
group_by(SUBJECT) %>%
mutate(TOT = n()) %>%
group_by(SUBJECT, TYPE_RESPONSE) %>%
summarize(perc = n()/TOT * 100) %>%
spread(TYPE_RESPONSE, perc)
DATA:
df <- tibble( SUBJECT= rep(c("John", "Mary","Bill"), each = 4),
TYPE = rep(c("a","b"), 6),
RESPONSE = rep(c("kappa", "gamma", "delta"), 4)
)
EDIT in reply to comment:
I understand that you want to calculate the percentage by SUBJECT
and TYPE
, so the code would be something like this:
library(tidyverse)
df %>%
group_by(SUBJECT, TYPE) %>%
mutate(TOT = n()) %>%
unite(TYPE_RESPONSE, c("TYPE", "RESPONSE"), sep = "_") %>%
group_by(SUBJECT, TYPE_RESPONSE) %>%
summarize(perc = n()/TOT * 100)%>%
spread(TYPE_RESPONSE, perc)
Upvotes: 1