Reputation: 299
I am new to ggplot2
and trying to plot a continuous histogram showing the evolution of reviews by date and rating.
My data set look like this:
date rating reviews
1 2017-11-24 1 some text here
2 2017-11-24 1 some text here
3 2017-12-02 5 some text here
4 2017-11-24 3 some text here
5 2017-11-24 3 some text here
6 2017-11-24 4 some text here
What I want to get is something like this:
for rating == 1
date count
1 2017-11-24 2
2 2017-11-25 7
.
.
.
and so on for rating == 2
and 3
I've tried
ggplot(aes(x = date, y = rating), data = df) + geom_line()
but it gives me only rating on the y axis and not counts:
Upvotes: 2
Views: 826
Reputation: 327
Just using some dummy data:
library(tidyverse)
set.seed(999)
df <- data.frame(date = sample(seq(as.Date('2017/01/01'), as.Date('2017/04/01'), by="day"), 2000, replace = T),
rating = sample(1:5,2000,replace = T))
df$rating <- as.factor(df$rating)
df %>%
group_by(date,rating) %>%
summarise(n = length(rating)) %>%
ggplot(aes(date,n, color = rating)) +
geom_line() +
geom_point()
Upvotes: 1
Reputation: 29237
You can use dplyr
to get the desired dataset and pass that into ggplot()
;
library(dplyr)
library(ggplot2)
sample_data %>% group_by(rating,date) %>% summarise(n=n()) %>%
ggplot(aes(x=date, y=n, group=rating, color=as.factor(rating))) +
geom_line(size=1.5) + geom_point()
Data:
sample_data <- structure(list(id = c(1L, 2L, 2L, 3L, 4L, 5L, 5L, 6L, 6L, 1L,
2L, 3L, 3L, 4L, 5L, 6L, 1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L), date = structure(c(1L,
1L, 3L, 7L, 1L, 1L, 1L, 1L, 5L, 2L, 3L, 8L, 8L, 3L, 4L, 5L, 5L,
6L, 6L, 6L, 9L, 6L, 6L, 6L), .Label = c("2017-11-24", "2017-11-25",
"2017-11-26", "2017-11-27", "2017-11-28", "2017-11-29", "2017-12-02",
"2017-12-04", "2017-12-08"), class = "factor"), rating = c(1L,
1L, 1L, 5L, 3L, 3L, 3L, 4L, 4L, 1L, 1L, 5L, 5L, 3L, 3L, 4L, 1L,
1L, 1L, 1L, 5L, 3L, 3L, 4L), reviews = structure(c(1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), .Label = "review", class = "factor")), .Names = c("id",
"date", "rating", "reviews"), row.names = c(NA, 24L), class = "data.frame")
Upvotes: 1