user3496218
user3496218

Reputation: 209

R Programming and GGPLOT2 - how to SUM data to create a line graph

I am very new to R and am looking to create a line graph using GGPLOT2 for a set of data I have. The data is comprised of Dates,Comments,Qtys - there are multiple date/comment/qtys combination but what I need to do is get the data to a place where each date and comment combination has ONE value for qty - essentially to SUM them. I know how do do that in SQL and am using the sqldf package to try to achieve the same but instead I get one row returned.

Sample Data:

Date     Comments         BuysToForecast
   <fct>    <fct>                     <int>
 1 4/6/2018 Approved                    573
 2 4/6/2018 ""                          363
 3 4/6/2018 ""                          503
 4 4/6/2018 ""                          324
 5 4/6/2018 Cancelled                  1844
 6 4/6/2018 Consumed By SO's            422
 7 4/6/2018 Consumed By SO's            790
 8 4/6/2018 Reduced                     515
 9 4/6/2018 Reduced                     790
10 4/6/2018 Approved                   1378

what I need is for example = there would only be one 4/5/2018/Consumed By SO's/Qty value that would have all of the qtys. When I use the below SQL in sqldf I get said the results shown:

SQLDF:

dp1 <- sqldf("select Date, 
                     Comments,
                     SUM(BuysToForecast) AS 'SUM' from dp")

head(dp1)(example data): 

       Date Comments     SUM
1 5/30/2018 Approved 2062046

apologies ahead of time for any formatting issues - very new to R

Upvotes: 0

Views: 2806

Answers (2)

G. Grothendieck
G. Grothendieck

Reputation: 270170

You need to use group by to have a sum for each group. The SQL statement shown in the question only asks for an overall sum. Assuming the data in the Note at the end (taken from the question).

library(ggplot2)
library(sqldf)

dp <- dp0
dp$Date <- as.Date(dp$Date, "%m/%d/%Y")

dp1 <- sqldf("select Date, Comments, SUM(BuysToForecast) AS 'SUM' 
              from dp 
              group by Date, Comments")

ggplot(dp1, aes(Date, SUM, color = Comments)) + 
  geom_point() +
  geom_line()

Note

Lines <- "Date     Comments         BuysToForecast
 1 4/6/2018 Approved                    573
 2 4/6/2018 \"\"                          363
 3 4/6/2018 \"\"                          503
 4 4/6/2018 \"\"                          324
 5 4/6/2018 \"Cancelled\"                  1844
 6 4/6/2018 \"Consumed By SO's\"            422
 7 4/6/2018 \"Consumed By SO's\"            790
 8 4/6/2018 Reduced                     515
 9 4/6/2018 Reduced                     790
10 4/6/2018 Approved                   1378"

dp0 <- read.table(text = Lines)

Upvotes: 0

eipi10
eipi10

Reputation: 93871

Here's an example with fake data that I hope is similar to your data.

library(tidyverse)
library(lubridate)

# Create fake data
set.seed(2)
n=1e3
dat = data_frame(date = sample(seq(ymd("2015-01-01"), ymd("2015-02-28"), length=60), n, replace=TRUE),
                 comment=sample(LETTERS[1:5], n, replace=TRUE),
                 value=sample(1:100, n, replace=TRUE))

# Convert date to factor to match your data
dat$date = format(dat$date, "%m/%d/%Y")
dat$date = factor(dat$date)

Okay, now we're ready to begin. First, we'll convert your "date" values to R's Date class. Right now your dates are coded as factor class, meaning R is treating it as a categorical variable. Converting it to Date class will result in ggplot treating them as dates, rather than categories. We'll use the lubridate mdy function (short for month-day-year, which is the order of values in your date column).

# Convert date to Date class
dat$date = mdy(dat$date)

Now there are two options. Pre-summarise the dat and feed the summarized data to ggplot:

dat %>% 
  group_by(date, comment) %>% 
  summarise(value=sum(value)) %>% 
  ungroup %>% 
ggplot(aes(date, value, colour=comment)) +
  geom_line() +
  geom_point()

Or, let ggplot take care of the summary for you:

ggplot(dat, aes(date, value, colour=comment)) +
  stat_summary(fun.y=sum, geom="line") +
  stat_summary(fun.y=sum, geom="point")

Either way, the plot looks like this:

enter image description here

Having ggplot do the summary was easier here, but sometimes you'll want the additional flexibility of processing the data outside of ggplot.

Upvotes: 2

Related Questions