Kriti
Kriti

Reputation: 45

Add multiple columns counting frequencies in rows in R

I have a dataset of protocols that participants followed and they chose how frequently they followed the protocol - never, sometimes, always. This is coded as 1, 2 and 3.

Sample df

Protocol 1 Protocol 2 Protocol 3
1 2 3
3 2 3
2 1 2
2 3 3
1 3 3
3 3 3

Each row corresponds to one participant saying how frequently they followed each protocol (never = 1, sometimes = 2, always = 3)

I want to transpose the df and add three columns counting the frequency of each option of each protocol and percentage in brackets

So, it would look something like this:

Never Sometimes Always
Protocol 1 1 3 2 2 1 3 2(33.33%) 2(33.33%) 2(33.33%)
Protocol 2 2 2 1 3 3 3 1(16.67%) 2(33.33%) 3(50%)
Protocol 3 3 3 2 3 3 3 0(0%) 1(16.67%) 5(83.33%)

I tried multiple solutions from stackoverflow and others but nothing seems to work. Even if I can't show percentages and just the frequency, that would work too.

Upvotes: 2

Views: 51

Answers (2)

Zaw
Zaw

Reputation: 1474

Here is another approach.

library(tidyverse)
library(glue)

dat1 <- dat %>% 
  pivot_longer(everything()) %>% 
  group_by(name) %>% 
  summarise(
    Never = glue("{sum(value == 1)} ({round(mean(value == 1) * 100, 2)}%)"),
    Sometiimes = glue("{sum(value == 2)} ({round(mean(value == 2) * 100, 2)}%)"),
    Always = glue("{sum(value == 3)} ({round(mean(value == 3) * 100, 2)}%)"),
  )

dat2 <- dat %>% 
  pivot_longer(everything()) %>% 
  group_by(name) %>% 
  mutate(n = row_number()) %>% 
  ungroup() %>% 
  pivot_wider(
    id_cols = name,
    names_from = n,
    values_from = value
  ) %>% 
  left_join(dat1, by = "name")

dat2

# # A tibble: 3 x 10
#   name         `1`   `2`   `3`   `4`   `5`   `6` Never      Sometiimes Always    
#   <chr>      <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <glue>     <glue>     <glue>    
# 1 Protocol 1     1     3     2     2     1     3 2 (33.33%) 2 (33.33%) 2 (33.33%)
# 2 Protocol 2     2     2     1     3     3     3 1 (16.67%) 2 (33.33%) 3 (50%)   
# 3 Protocol 3     3     3     2     3     3     3 0 (0%)     1 (16.67%) 5 (83.33%)

Data:

dat <- tibble(
  `Protocol 1` = c(1, 3, 2, 2, 1, 3),
  `Protocol 2` = c(2, 2, 1, 3, 3, 3),
  `Protocol 3` = c(3, 3, 2, 3, 3, 3)
)

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 389235

Here is one tidyverse approach -

library(dplyr)
library(tidyr)

df %>%
  pivot_longer(cols = everything()) %>%
  mutate(value = factor(value)) %>%
  count(name, value, .drop = FALSE) %>%
  group_by(name) %>%
  mutate(n = sprintf('%d (%.2f %%)', n, prop.table(n) * 100), 
         value = recode(value,`1`='never',`2` = 'sometimes',`3` = 'always')) %>%
  ungroup %>%
  pivot_wider(names_from = value, values_from = n)

#  name      never       sometimes   always     
#  <chr>     <chr>       <chr>       <chr>      
#1 Protocol1 2 (33.33 %) 2 (33.33 %) 2 (33.33 %)
#2 Protocol2 1 (16.67 %) 2 (33.33 %) 3 (50.00 %)
#3 Protocol3 0 (0.00 %)  1 (16.67 %) 5 (83.33 %)

data

df <- structure(list(Protocol1 = c(1L, 3L, 2L, 2L, 1L, 3L), Protocol2 = c(2L, 
2L, 1L, 3L, 3L, 3L), Protocol3 = c(3L, 3L, 2L, 3L, 3L, 3L)), 
class = "data.frame", row.names = c(NA, -6L))

Upvotes: 2

Related Questions