Reputation: 63
Could someone please kindly give me their advice?
What is the best way to do this?
I tried with ColSums but it gave me an error (Error in colSums(., mpaa_rating, na.rm = FALSE, dims = 1) : unused argument (mpaa_rating). I was evidently not using it correctly or entering it at the right place. I tried: colSums (mpaa_rating, na.rm = FALSE, dims = 1) %>% just above spread.
Many thanks in advance, Christine
rereprex::reprex_info()
movie_help<- data.frame(tribble(
~mpaa_rating, ~genre,
"PG", "Action & Adventure",
"R", "Mystery & Suspense",
"R", "Drama",
"R", "Drama",
"R", "Drama",
"PG", "Action & Adventure",
"PG-13", "Comedy",
"R", "Comedy",
"R", "Action & Adventure",
"R", "Drama",
"R", "Drama",
"G", "Drama",
"R", "Comedy",
"R", "Drama",
"R", "Mystery & Suspense",
"R", "Musical & Performing Arts",
"Unrated", "Drama",
"R", "Drama",
"PG-13", "Drama",
"PG-13", "Drama"
))
movie_help %>%
filter(!is.na(genre), !is.na(mpaa_rating)) %>%
count(genre, mpaa_rating) %>%
group_by(genre) %>%
mutate(prop = n) %>%
mutate(Total= sum(n)) %>%
select(-n) %>%
spread(key = mpaa_rating, value = prop)
#> # A tibble: 5 x 7
#> # Groups: genre [5]
#> genre Total G PG `PG-13` R Unrated
#> * <chr> <int> <int> <int> <int> <int> <int>
#> 1 Action & Adventure 3 NA 2 NA 1 NA
#> 2 Comedy 3 NA NA 1 2 NA
#> 3 Drama 11 1 NA 2 7 1
#> 4 Musical & Performing Arts 1 NA NA NA 1 NA
#> 5 Mystery & Suspense 2 NA NA NA 2 NA
Upvotes: 2
Views: 907
Reputation: 1124
To get the sum at the bottom, I like to use the janitor::adorn_totals
function from the janitor package. The janitor package has many little helper functions for situations where you want to clean tables in the way you want. Check more about it here. My favorite is also the janitor::clean_names
which helps you sanitize column names uniformly.
Now in your case you can simply:
movie_help %>%
filter(!is.na(genre), !is.na(mpaa_rating)) %>%
count(genre, mpaa_rating) %>%
group_by(genre) %>%
mutate(prop = n) %>%
mutate(Total= sum(n)) %>%
select(-n) %>%
spread(key = mpaa_rating, value = prop, fill = 0) %>%
janitor::adorn_totals('row') %>%
janitor::clean_names()
Upvotes: 4
Reputation: 10781
We can use table
and chisq.test
to perform the test you want:
chisq.test(table(movie_help))
We can also manually calculate the totals:
dat <- movie_help %>%
filter(!is.na(genre),!is.na(mpaa_rating)) %>%
count(genre, mpaa_rating) %>%
group_by(genre) %>%
mutate(prop = n) %>%
mutate(Total = sum(n)) %>%
select(-n) %>%
spread(key = mpaa_rating, value = prop)
bind_rows(dat,
cbind(data_frame('genre' = 'Total'), summarise_all(dat[,-1], sum, na.rm = T)))
genre Total G PG `PG-13` R Unrated
<chr> <int> <int> <int> <int> <int> <int>
1 Action & Adventure 3 NA 2 NA 1 NA
2 Comedy 3 NA NA 1 2 NA
3 Drama 11 1 NA 2 7 1
4 Musical & Performing Arts 1 NA NA NA 1 NA
5 Mystery & Suspense 2 NA NA NA 2 NA
6 Total 20 1 2 3 13 1
Upvotes: 0