Reputation: 209
I am very new to R and am looking to create a line graph using GGPLOT2 for a set of data I have. The data is comprised of Dates,Comments,Qtys - there are multiple date/comment/qtys combination but what I need to do is get the data to a place where each date and comment combination has ONE value for qty - essentially to SUM them. I know how do do that in SQL and am using the sqldf package to try to achieve the same but instead I get one row returned.
Sample Data:
Date Comments BuysToForecast
<fct> <fct> <int>
1 4/6/2018 Approved 573
2 4/6/2018 "" 363
3 4/6/2018 "" 503
4 4/6/2018 "" 324
5 4/6/2018 Cancelled 1844
6 4/6/2018 Consumed By SO's 422
7 4/6/2018 Consumed By SO's 790
8 4/6/2018 Reduced 515
9 4/6/2018 Reduced 790
10 4/6/2018 Approved 1378
what I need is for example = there would only be one 4/5/2018/Consumed By SO's/Qty value that would have all of the qtys. When I use the below SQL in sqldf I get said the results shown:
SQLDF:
dp1 <- sqldf("select Date,
Comments,
SUM(BuysToForecast) AS 'SUM' from dp")
head(dp1)(example data):
Date Comments SUM
1 5/30/2018 Approved 2062046
apologies ahead of time for any formatting issues - very new to R
Upvotes: 0
Views: 2806
Reputation: 270170
You need to use group by
to have a sum for each group. The SQL statement shown in the question only asks for an overall sum. Assuming the data in the Note at the end (taken from the question).
library(ggplot2)
library(sqldf)
dp <- dp0
dp$Date <- as.Date(dp$Date, "%m/%d/%Y")
dp1 <- sqldf("select Date, Comments, SUM(BuysToForecast) AS 'SUM'
from dp
group by Date, Comments")
ggplot(dp1, aes(Date, SUM, color = Comments)) +
geom_point() +
geom_line()
Lines <- "Date Comments BuysToForecast
1 4/6/2018 Approved 573
2 4/6/2018 \"\" 363
3 4/6/2018 \"\" 503
4 4/6/2018 \"\" 324
5 4/6/2018 \"Cancelled\" 1844
6 4/6/2018 \"Consumed By SO's\" 422
7 4/6/2018 \"Consumed By SO's\" 790
8 4/6/2018 Reduced 515
9 4/6/2018 Reduced 790
10 4/6/2018 Approved 1378"
dp0 <- read.table(text = Lines)
Upvotes: 0
Reputation: 93871
Here's an example with fake data that I hope is similar to your data.
library(tidyverse)
library(lubridate)
# Create fake data
set.seed(2)
n=1e3
dat = data_frame(date = sample(seq(ymd("2015-01-01"), ymd("2015-02-28"), length=60), n, replace=TRUE),
comment=sample(LETTERS[1:5], n, replace=TRUE),
value=sample(1:100, n, replace=TRUE))
# Convert date to factor to match your data
dat$date = format(dat$date, "%m/%d/%Y")
dat$date = factor(dat$date)
Okay, now we're ready to begin. First, we'll convert your "date" values to R's Date class. Right now your dates are coded as factor class, meaning R is treating it as a categorical variable. Converting it to Date class will result in ggplot treating them as dates, rather than categories. We'll use the lubridate mdy
function (short for month-day-year, which is the order of values in your date column).
# Convert date to Date class
dat$date = mdy(dat$date)
Now there are two options. Pre-summarise the dat and feed the summarized data to ggplot:
dat %>%
group_by(date, comment) %>%
summarise(value=sum(value)) %>%
ungroup %>%
ggplot(aes(date, value, colour=comment)) +
geom_line() +
geom_point()
Or, let ggplot take care of the summary for you:
ggplot(dat, aes(date, value, colour=comment)) +
stat_summary(fun.y=sum, geom="line") +
stat_summary(fun.y=sum, geom="point")
Either way, the plot looks like this:
Having ggplot do the summary was easier here, but sometimes you'll want the additional flexibility of processing the data outside of ggplot.
Upvotes: 2