Adrian
Adrian

Reputation: 9793

Summarizing count data as proportion in a data.frame

dummy <- data.frame(Q1 = c(0, 1, 0, 1),
                    Q2 = c(1, 1, 0, 1),
                    Q3 = c(0, 1, 1, 0))
df_dummy <- data.frame(Question = c("Q1", "Q2", "Q3"),
                       X1 = c(2/4, 3/4, 2/4),
                       X0 = c(2/4, 1/4, 2/4))

> dummy
  Q1 Q2 Q3
1  0  1  0
2  1  1  1
3  0  0  1
4  1  1  0

> df_dummy
  Question   X1   X0
1       Q1 0.50 0.50
2       Q2 0.75 0.25
3       Q3 0.50 0.50

I have some data (dummy) where I have binary responses to Q1, Q2, and Q3. I want to summarize my data in the format as shown in df_dummy, where for each question, column X1 tells me the proportion of people that answered 1 to Q1, and column X0 tells me the proportion of people that answered 0 to Q0. I tried prop.table but that didn't return the desired result.

Upvotes: 1

Views: 192

Answers (6)

alistaire
alistaire

Reputation: 43334

A tidyverse option:

library(tidyr)
library(janitor)

dummy %>%
  gather(question, val) %>%    # reshape to long form
  tabyl(question, val) %>%    # make crosstab table
  adorn_percentages("row") %>%
  clean_names() 



 question   x0   x1
       Q1 0.50 0.50
       Q2 0.25 0.75
       Q3 0.50 0.50

Upvotes: 1

Aayush Agrawal
Aayush Agrawal

Reputation: 194

Another way to do this would be using do.call & lapply

do.call(cbind,lapply(dummy,function(x) data.frame(table(x))[,2]))
#    Q1 Q2 Q3
[1,]  2  1  2
[2,]  2  3  2

Upvotes: 2

Cath
Cath

Reputation: 24074

Another way is counting the proportion of 1s and then deducing from that the proportion of 0s:

X1 <- colSums(dummy==1)/nrow(dummy)
df_dummy <- data.frame(X1, X0=1-X1)
df_dummy
#     X1   X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50

NB, inspired from @akrun's idea of ColMeans: You can also use colMeans instead of dividing colSumsby the number of row to define X1:

X1 <- colMeans(dummy==1)
df_dummy <- data.frame(X1, X0=1-X1)
df_dummy
#     X1   X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50

Upvotes: 4

akrun
akrun

Reputation: 887118

We can do this with table and prop.table

t(sapply(dummy, function(x) prop.table(table(x))))
#     0    1
#Q1 0.50 0.50
#Q2 0.25 0.75
#Q3 0.50 0.50

Or a more efficient approach is to call table once

prop.table(table(stack(dummy)[2:1]),1)
#   values
#ind     0    1
#  Q1 0.50 0.50
#  Q2 0.25 0.75
#  Q3 0.50 0.50

Or another option is colMeans (inspired from @Cath's use of colSums)

X0 <- colMeans(!dummy)
data.frame(X1 = 1 - X0, X0)
#    X1   X0
#Q1 0.50 0.50
#Q2 0.75 0.25
#Q3 0.50 0.50

Upvotes: 2

Ronak Shah
Ronak Shah

Reputation: 388982

We can try apply with margin =2 and divide the counts of each value with the total length in the column

t(apply(dummy, 2, function(x) table(x)/length(x)))

#     0    1
#Q1 0.50 0.50
#Q2 0.25 0.75
#Q3 0.50 0.50

Upvotes: 4

Fr.
Fr.

Reputation: 2885

Less elegantly than in the answer above:

d <- t(dummy)
cbind(X0 = (ncol(d) - rowSums(d)) / ncol(d), X1 = rowSums(d) / ncol(d))

Or, to avoid computing the same stuff twice, and to get a data frame:

d <- t(dummy)
i <- ncol(d)
j <- rowSums(d)
data.frame(Question = rownames(d), X0 = (i - j) / i, X1 = j / i)

There you go:

   Question   X0   X1
Q1       Q1 0.50 0.50
Q2       Q2 0.25 0.75
Q3       Q3 0.50 0.50

Upvotes: 1

Related Questions