Reputation: 17
I have a data frame from a rating survey that reflects percentages. All percentages are dbl:
sample_size club bad(%) below_avg(%) neutral(%) good(%) very_good(%)
134 A 10 30 45 5 10
1586 B 12 30 24 4 30
588 C 43 10 17 16 14
345 B 30 51 10 5 4
2500 C 21 19 30 15 15
I want to generate a table that shows the total percentage of respondents who rated the product good, neutral and bad, grouped by club. Do I have to create additional columns that show respondent count for each rating, or is there a more direct way to generate the table using arithmetic within the table() function?
Edit: For clarity, I need the following output:
club bad neutral good
A 13.4 60.3 6.7
B 15.21 22.02 4.19
C 25.18 27.52 15.19
These values are computed as follows:
response_count for each club (computed as percentage of corresponding row sample size)
total percentage for each club = sum(response_count)/sum(sample_size of club)*100
For reference:
sum(sample_size of A)= 134
sum(sample_size of B)= 1586+345= 1930
sum(sample_size of C)= 588+2500= 3088
Upvotes: 0
Views: 102
Reputation: 161085
library(dplyr)
dat %>%
group_by(club) %>%
summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100)/sum(sample_size)))
# # A tibble: 3 x 4
# club bad... neutral... good...
# * <chr> <dbl> <dbl> <dbl>
# 1 A 10 45 5
# 2 B 15.2 21.5 4.18
# 3 C 25.2 27.5 15.2
Since you say you're getting NA
in your results, that means your sample data here is not representative (and you didn't mention it). You should be able to add na.rm=TRUE
to the sum
s to fix this. As a walk-through:
dat$bad...[3] <- NA
dat %>%
group_by(club) %>%
summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100)/sum(sample_size)))
# # A tibble: 3 x 4
# club bad... neutral... good...
# * <chr> <dbl> <dbl> <dbl>
# 1 A 10 45 5
# 2 B 15.2 21.5 4.18
# 3 C NA 27.5 15.2
dat %>%
group_by(club) %>%
summarize(across(c(bad..., neutral..., good...), ~ 100*sum(.*sample_size/100, na.rm = TRUE)/sum(sample_size, na.rm = TRUE)))
# # A tibble: 3 x 4
# club bad... neutral... good...
# * <chr> <dbl> <dbl> <dbl>
# 1 A 10 45 5
# 2 B 15.2 21.5 4.18
# 3 C 17.0 27.5 15.2
(Of course this "C"
bad rating will be wrong, because I removed good data ... it's just part of the example. Try it with your real data.)
Data
dat <- structure(list(sample_size = c(134L, 1586L, 588L, 345L, 2500L), club = c("A", "B", "C", "B", "C"), bad... = c(10, 12, 43, 30, 21), below_avg... = c(30L, 30L, 10L, 51L, 19L), neutral... = c(45L, 24L, 17L, 10L, 30L), good... = c(5L, 4L, 16L, 5L, 15L), very_good... = c(10L, 30L, 14L, 4L, 15L)), row.names = c(NA, -5L), class = "data.frame")
Upvotes: 2
Reputation: 2859
Solution using the dplyr
package.
df <- structure(list(sample_size = c(134L, 1586L, 588L, 345L, 2500L
), club = c("A", "B", "C", "B", "C"), bad = c(10L, 12L, 43L,
30L, 21L), below_avg = c(30L, 30L, 10L, 51L, 19L), neutral = c(45L,
24L, 17L, 10L, 30L), good = c(5L, 4L, 16L, 5L, 15L), very_good = c(10L,
30L, 14L, 4L, 15L)), class = "data.frame", row.names = c(NA,
-5L))
library(dplyr)
df %>%
group_by(club) %>%
summarise_all(., ~if(is.numeric(.)) sum(.)) %>%
rowwise() %>%
mutate(
total_percentage = sum(bad, neutral, good) / sum(bad, neutral, good, below_avg, very_good)
)%>%
select(club, sample_size, bad, neutral, good, total_percentage)
#> # A tibble: 3 x 6
#> # Rowwise:
#> club sample_size bad neutral good total_percentage
#> <chr> <int> <int> <int> <int> <dbl>
#> 1 A 134 10 45 5 0.6
#> 2 B 1931 42 34 9 0.425
#> 3 C 3088 64 47 31 0.71
Created on 2021-03-10 by the reprex package (v0.3.0)
Upvotes: 0
Reputation: 79286
library(tidyverse)
tribble(
~sample_size, ~club, ~bad, ~below_avg, ~neutral, ~good, ~very_good,
134, "A", 10, 30, 45, 5, 10,
1586, "B", 12, 30, 24, 4, 30,
588, "C", 43, 10, 17, 16, 14,
345, "B", 30, 51, 10, 5, 4,
2500, "C", 21, 19, 30, 15, 15) %>%
group_by(club) %>%
summarise(total_percent = sum(bad, neutral, good))
# Output
# A tibble: 3 x 2
club total_percent
<chr> <dbl>
1 A 60
2 B 85
3 C 142
or:
library(tidyverse)
tribble(
~sample_size, ~club, ~bad, ~below_avg, ~neutral, ~good, ~very_good,
134, "A", 10, 30, 45, 5, 10,
1586, "B", 12, 30, 24, 4, 30,
588, "C", 43, 10, 17, 16, 14,
345, "B", 30, 51, 10, 5, 4,
2500, "C", 21, 19, 30, 15, 15) %>%
group_by(club) %>%
summarise(across(where(is.numeric), sum)) %>%
select(-below_avg, -very_good)
# Output
# A tibble: 3 x 5
club sample_size bad neutral good
<chr> <dbl> <dbl> <dbl> <dbl>
1 A 134 10 45 5
2 B 1931 42 34 9
3 C 3088 64 47 31
Upvotes: 1