saul
saul

Reputation: 299

plot count of discrete data by date

I am new to ggplot2 and trying to plot a continuous histogram showing the evolution of reviews by date and rating.

My data set look like this:

        date rating reviews
1 2017-11-24      1 some text here
2 2017-11-24      1 some text here
3 2017-12-02      5 some text here
4 2017-11-24      3 some text here
5 2017-11-24      3 some text here
6 2017-11-24      4 some text here

What I want to get is something like this:

for rating == 1

        date    count
1  2017-11-24      2
2  2017-11-25      7
.
.
.

and so on for rating == 2 and 3

I've tried

ggplot(aes(x = date, y = rating), data = df) + geom_line()

but it gives me only rating on the y axis and not counts:

enter image description here

Upvotes: 2

Views: 826

Answers (2)

Antonio
Antonio

Reputation: 327

Just using some dummy data:

  library(tidyverse)
  set.seed(999)
  df <- data.frame(date = sample(seq(as.Date('2017/01/01'), as.Date('2017/04/01'), by="day"), 2000, replace = T),
             rating = sample(1:5,2000,replace = T))
  df$rating <- as.factor(df$rating)

  df %>%
  group_by(date,rating) %>%
  summarise(n = length(rating)) %>%
  ggplot(aes(date,n, color = rating)) +
  geom_line() +
  geom_point()

Upvotes: 1

M--
M--

Reputation: 29237

You can use dplyr to get the desired dataset and pass that into ggplot();

library(dplyr)
library(ggplot2)

 sample_data %>% group_by(rating,date) %>% summarise(n=n()) %>%
                ggplot(aes(x=date, y=n, group=rating, color=as.factor(rating))) +
                          geom_line(size=1.5) + geom_point()

enter image description here

Data:

sample_data <- structure(list(id = c(1L, 2L, 2L, 3L, 4L, 5L, 5L, 6L, 6L, 1L,           
     2L, 3L, 3L, 4L, 5L, 6L, 1L, 2L, 2L, 2L, 3L, 4L, 5L, 6L), date = structure(c(1L, 
     1L, 3L, 7L, 1L, 1L, 1L, 1L, 5L, 2L, 3L, 8L, 8L, 3L, 4L, 5L, 5L,                 
     6L, 6L, 6L, 9L, 6L, 6L, 6L), .Label = c("2017-11-24", "2017-11-25",             
     "2017-11-26", "2017-11-27", "2017-11-28", "2017-11-29", "2017-12-02",           
     "2017-12-04", "2017-12-08"), class = "factor"), rating = c(1L,                  
     1L, 1L, 5L, 3L, 3L, 3L, 4L, 4L, 1L, 1L, 5L, 5L, 3L, 3L, 4L, 1L,                 
     1L, 1L, 1L, 5L, 3L, 3L, 4L), reviews = structure(c(1L, 1L, 1L,                  
     1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,                 
     1L, 1L, 1L, 1L, 1L), .Label = "review", class = "factor")), .Names = c("id",    
     "date", "rating", "reviews"), row.names = c(NA, 24L), class = "data.frame")   

Upvotes: 1

Related Questions