Reputation: 372
This should be super simple, but I can't seem to figure it out.
I am using the ggplot2movies
library to get the dataframe movies
and I am trying to summarize the data into a dataframe that is easier to graph with. In case you don't want to load the ggplot2movies
library, a sample of the relevant data is:
# A tibble: 6 x 2
year rating
<int> <dbl>
1 1971 6.4
2 1939 6
3 1941 8.2
4 1996 8.2
5 1975 3.4
6 2000 4.3
I have the following successful code, based on the plyr
library:
years <- ddply(movies,"year",summarize,rating=mean(rating))
Which gives such a result, perfect for a plot or line chart:
> head(years)
year rating
1 1893 7.000000
2 1894 4.888889
3 1895 5.500000
4 1896 5.269231
5 1897 4.677778
6 1898 5.040000
However, I can't sort out a way to add a count column, in order to have a third variable, such as size
, which can visualize the volume of movies produced each year on the plot chart. It should be something simple like:
years <- ddply(movies,"year",summarize,rating=mean(rating),count=count(years))
However, this gives an error:
Error in summarise_impl(.data, dots) : Evaluation error: no applicable method for 'groups' applied to an object of class "character".
I could add a column to the original dataframe that is just a repeating value of 1, and then sum that column. But with how versatile and useful R is, I think there is some much more simple and appropriate way to do it within the ddplyr function.
Upvotes: 3
Views: 288
Reputation: 76653
You can use n()
to give the count.
library(ggplot2movies)
library(dplyr)
data("movies")
movies %>%
group_by(year) %>%
summarise(rating = mean(rating),
years = n()) -> mvs
head(mvs, 10)
## A tibble: 10 x 3
# year rating years
# <int> <dbl> <int>
# 1 1893 7 1
# 2 1894 4.89 9
# 3 1895 5.5 3
# 4 1896 5.27 13
# 5 1897 4.68 9
# 6 1898 5.04 5
# 7 1899 4.28 9
# 8 1900 4.73 16
# 9 1901 4.68 28
#10 1902 4.9 9
Another solution is with package plyr
, as suggested by the OP.
library(plyr)
mvs2 <- ddply(movies, "year", summarize,
rating = mean(rating), years = length(year))
all.equal(mvs, mvs2)
#[1] TRUE
Upvotes: 2