HelpWithR
HelpWithR

Reputation: 17

Compute table to calculate total percentage from multiple surveys

I have a data frame from a rating survey that reflects percentages. All percentages are dbl:

 sample_size  club      bad(%)      below_avg(%)    neutral(%)   good(%)   very_good(%)
  134          A         10          30               45          5         10      
  1586         B         12          30               24          4         30  
  588          C         43          10               17          16        14 
  345          B         30          51               10           5         4
  2500         C         21          19               30          15        15

I want to generate a table that shows the total percentage of respondents who rated the product good, neutral and bad, grouped by club. Do I have to create additional columns that show respondent count for each rating, or is there a more direct way to generate the table using arithmetic within the table() function?

Edit: For clarity, I need the following output:

club    bad     neutral   good

A      13.4    60.3      6.7

B      15.21   22.02     4.19

C      25.18   27.52     15.19

These values are computed as follows:

response_count for each club (computed as percentage of corresponding row sample size)

total percentage for each club = sum(response_count)/sum(sample_size of club)*100

For reference:

sum(sample_size of A)= 134

sum(sample_size of B)= 1586+345= 1930

sum(sample_size of C)= 588+2500= 3088

Upvotes: 0

Views: 102

Answers (3)

r2evans
r2evans

Reputation: 161085

library(dplyr)
dat %>%
  group_by(club) %>%
  summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100)/sum(sample_size)))
# # A tibble: 3 x 4
#   club  bad... neutral... good...
# * <chr>  <dbl>      <dbl>   <dbl>
# 1 A       10         45      5   
# 2 B       15.2       21.5    4.18
# 3 C       25.2       27.5   15.2 

Since you say you're getting NA in your results, that means your sample data here is not representative (and you didn't mention it). You should be able to add na.rm=TRUE to the sums to fix this. As a walk-through:

dat$bad...[3] <- NA
dat %>%
  group_by(club) %>%
  summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100)/sum(sample_size)))
# # A tibble: 3 x 4
#   club  bad... neutral... good...
# * <chr>  <dbl>      <dbl>   <dbl>
# 1 A       10         45      5   
# 2 B       15.2       21.5    4.18
# 3 C       NA         27.5   15.2 
dat %>%
  group_by(club) %>%
  summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100, na.rm = TRUE)/sum(sample_size, na.rm = TRUE)))
# # A tibble: 3 x 4
#   club  bad... neutral... good...
# * <chr>  <dbl>      <dbl>   <dbl>
# 1 A       10         45      5   
# 2 B       15.2       21.5    4.18
# 3 C       17.0       27.5   15.2 

(Of course this "C" bad rating will be wrong, because I removed good data ... it's just part of the example. Try it with your real data.)


Data

dat <- structure(list(sample_size = c(134L, 1586L, 588L, 345L, 2500L), club = c("A", "B", "C", "B", "C"), bad... = c(10, 12, 43, 30, 21), below_avg... = c(30L, 30L, 10L, 51L, 19L), neutral... = c(45L, 24L, 17L, 10L, 30L), good... = c(5L, 4L, 16L, 5L, 15L), very_good... = c(10L, 30L, 14L, 4L, 15L)), row.names = c(NA, -5L), class = "data.frame")

Upvotes: 2

Eric
Eric

Reputation: 2859

Solution using the dplyr package.

df <- structure(list(sample_size = c(134L, 1586L, 588L, 345L, 2500L
), club = c("A", "B", "C", "B", "C"), bad = c(10L, 12L, 43L, 
30L, 21L), below_avg = c(30L, 30L, 10L, 51L, 19L), neutral = c(45L, 
24L, 17L, 10L, 30L), good = c(5L, 4L, 16L, 5L, 15L), very_good = c(10L, 
30L, 14L, 4L, 15L)), class = "data.frame", row.names = c(NA, 
-5L))

library(dplyr)

df %>%
  group_by(club) %>% 
  summarise_all(., ~if(is.numeric(.)) sum(.)) %>%
  rowwise() %>% 
  mutate(
    total_percentage = sum(bad, neutral, good) / sum(bad, neutral, good, below_avg, very_good)
    )%>% 
  select(club, sample_size, bad, neutral, good, total_percentage)

#> # A tibble: 3 x 6
#> # Rowwise: 
#>   club  sample_size   bad neutral  good total_percentage
#>   <chr>       <int> <int>   <int> <int>            <dbl>
#> 1 A             134    10      45     5            0.6  
#> 2 B            1931    42      34     9            0.425
#> 3 C            3088    64      47    31            0.71

Created on 2021-03-10 by the reprex package (v0.3.0)

Upvotes: 0

TarJae
TarJae

Reputation: 79286

library(tidyverse)
tribble(
~sample_size,  ~club,     ~bad,      ~below_avg,    ~neutral,   ~good,   ~very_good,
134, "A", 10, 30, 45, 5, 10, 
1586, "B", 12, 30, 24, 4, 30, 
588, "C", 43, 10, 17, 16, 14, 
345, "B", 30, 51, 10, 5, 4, 
2500, "C", 21, 19, 30, 15, 15) %>% 
  group_by(club) %>% 
  summarise(total_percent = sum(bad, neutral, good))

# Output

# A tibble: 3 x 2
  club  total_percent
  <chr>         <dbl>
1 A                60
2 B                85
3 C               142

or:

library(tidyverse)
tribble(
~sample_size,  ~club,     ~bad,      ~below_avg,    ~neutral,   ~good,   ~very_good,
134, "A", 10, 30, 45, 5, 10, 
1586, "B", 12, 30, 24, 4, 30, 
588, "C", 43, 10, 17, 16, 14, 
345, "B", 30, 51, 10, 5, 4, 
2500, "C", 21, 19, 30, 15, 15) %>% 
  group_by(club) %>% 
  summarise(across(where(is.numeric), sum)) %>% 
  select(-below_avg, -very_good)

# Output
# A tibble: 3 x 5
  club  sample_size   bad neutral  good
  <chr>       <dbl> <dbl>   <dbl> <dbl>
1 A             134    10      45     5
2 B            1931    42      34     9
3 C            3088    64      47    31

Upvotes: 1

Related Questions