How to reshape a dataframe into percentage of categorical data

Question

I have a dataframe that contains longitudinal information (a long format).

mydata<-structure(list(record_id = c("a", "a", "a", "b", "b", "b", "c", "c","c"),event = structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label =c("e2", "e3", "e4"), class = "factor"), var1 = structure(c(2L, 1L, 1L, 1L,1L, 1L, 1L, 1L, 1L), .Label = c("no", "yes"), class = "factor"),var2 =structure(c(1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L), .Label = c("no", "yes"), class = "factor"), var3 = structure(c(2L, 2L, 1L, 2L, 2L, 1L, 1L, 1L, 1L),.Label = c("no", "yes"), class = "factor")), row.names = c(NA, -9L), class= c("tbl_df", "tbl", "data.frame"))

And I need to transform this data into a dataframe that summarize the percentage of "yes" counts of each variable (var1,var2,var3) according to the event (e2,e3,e4) to have something like this:

mydata_result<-structure(list(Event = structure(c(1L, 1L, 1L, 2L, 2L, 2L,3L, 3L, 3L), .Label = c("e2", "e3", "e4"), class = "factor"), Variable =structure(c(1L, 2L, 3L, 1L, 2L, 3L, 1L, 2L, 3L), .Label = c("var1", "var2", "var3"), class = "factor"), percentage_of_yes = c(0.33, 0.33, 0.66, 0, 0.33, 0.66, 0, 0, 0)), row.names = c(NA, -9L), class = c("tbl_df", "tbl","data.frame"))

Thank you!

Sotos · Accepted Answer

Using tidyverse, we can convert to long format, group by our variable and event, and do the percentage count, i.e.

library(tidyverse)

mydata %>% 
  gather(var, val, -c(1:2)) %>% 
  group_by(event, var) %>% 
  summarise(new = sum(val == 'yes')/n())

which gives,

# A tibble: 9 x 3
# Groups:   event [?]
  event var     new
    
1 e2    var1  0.333
2 e2    var2  0.333
3 e2    var3  0.667
4 e3    var1  0    
5 e3    var2  0.333
6 e3    var3  0.667
7 e4    var1  0    
8 e4    var2  0    
9 e4    var3  0

How to reshape a dataframe into percentage of categorical data

Answers (2)

Related Questions