Reputation: 9793
dummy <- data.frame(Q1 = c(0, 1, 0, 1),
Q2 = c(1, 1, 0, 1),
Q3 = c(0, 1, 1, 0))
df_dummy <- data.frame(Question = c("Q1", "Q2", "Q3"),
X1 = c(2/4, 3/4, 2/4),
X0 = c(2/4, 1/4, 2/4))
> dummy
Q1 Q2 Q3
1 0 1 0
2 1 1 1
3 0 0 1
4 1 1 0
> df_dummy
Question X1 X0
1 Q1 0.50 0.50
2 Q2 0.75 0.25
3 Q3 0.50 0.50
I have some data (dummy
) where I have binary responses to Q1, Q2, and Q3. I want to summarize my data in the format as shown in df_dummy
, where for each question, column X1
tells me the proportion of people that answered 1 to Q1, and column X0
tells me the proportion of people that answered 0 to Q0. I tried prop.table
but that didn't return the desired result.
Upvotes: 1
Views: 192
Reputation: 43334
A tidyverse option:
library(tidyr)
library(janitor)
dummy %>%
gather(question, val) %>% # reshape to long form
tabyl(question, val) %>% # make crosstab table
adorn_percentages("row") %>%
clean_names()
question x0 x1
Q1 0.50 0.50
Q2 0.25 0.75
Q3 0.50 0.50
Upvotes: 1
Reputation: 194
Another way to do this would be using do.call & lapply
do.call(cbind,lapply(dummy,function(x) data.frame(table(x))[,2]))
# Q1 Q2 Q3
[1,] 2 1 2
[2,] 2 3 2
Upvotes: 2
Reputation: 24074
Another way is counting the proportion of 1s
and then deducing from that the proportion of 0s
:
X1 <- colSums(dummy==1)/nrow(dummy)
df_dummy <- data.frame(X1, X0=1-X1)
df_dummy
# X1 X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50
NB, inspired from @akrun's idea of ColMeans
: You can also use colMeans
instead of dividing colSums
by the number of row to define X1:
X1 <- colMeans(dummy==1)
df_dummy <- data.frame(X1, X0=1-X1)
df_dummy
# X1 X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50
Upvotes: 4
Reputation: 887118
We can do this with table
and prop.table
t(sapply(dummy, function(x) prop.table(table(x))))
# 0 1
#Q1 0.50 0.50
#Q2 0.25 0.75
#Q3 0.50 0.50
Or a more efficient approach is to call table
once
prop.table(table(stack(dummy)[2:1]),1)
# values
#ind 0 1
# Q1 0.50 0.50
# Q2 0.25 0.75
# Q3 0.50 0.50
Or another option is colMeans
(inspired from @Cath's use of colSums
)
X0 <- colMeans(!dummy)
data.frame(X1 = 1 - X0, X0)
# X1 X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50
Upvotes: 2
Reputation: 388982
We can try apply
with margin =2
and divide the counts of each value with the total length in the column
t(apply(dummy, 2, function(x) table(x)/length(x)))
# 0 1
#Q1 0.50 0.50
#Q2 0.25 0.75
#Q3 0.50 0.50
Upvotes: 4
Reputation: 2885
Less elegantly than in the answer above:
d <- t(dummy)
cbind(X0 = (ncol(d) - rowSums(d)) / ncol(d), X1 = rowSums(d) / ncol(d))
Or, to avoid computing the same stuff twice, and to get a data frame:
d <- t(dummy)
i <- ncol(d)
j <- rowSums(d)
data.frame(Question = rownames(d), X0 = (i - j) / i, X1 = j / i)
There you go:
Question X0 X1
Q1 Q1 0.50 0.50
Q2 Q2 0.25 0.75
Q3 Q3 0.50 0.50
Upvotes: 1