Reputation: 313
Say I have the following data:
Date Month Year Miles Activity
3/1/2014 3 2014 72 Walking
3/1/2014 3 2014 85 Running
3/2/2014 3 2014 42 Running
4/1/2014 4 2014 65 Biking
1/1/2015 1 2015 21 Walking
1/2/2015 1 2015 32 Running
I want to make graphs that display the sum of each month's date for miles, grouped and colored by year. I know that I can make a separate data frame with the sum of the miles per month per activity, but the issue is in displaying. Here in Excel is basically what I want--the sums displayed chronologically and colored by activity.
I know ggplot2 has a scale_x_date
command, but I run into issues on "both sides" of the problem--if I use the Date
column as my X variable, they're not summed. But if I sum my data how I want it in a separate data frame (i.e., where every activity for every month has just one row), I can't use both Month
and Year
as my x-axis--at least, not in any way that I can get scale_x_date
to understand.
(And, I know, if Excel is graphing it correctly why not just use Excel--unfortunately, my data is so large that Excel was running very slowly and it's not feasible to keep using it.) Any ideas?
Upvotes: 1
Views: 607
Reputation: 457
The below worked fine for me with the small dataset. If you convert you data.frame to a data.table you can sum the data up to the mile per activity and month level with just a couple preprocessing steps. I've left some comments in the code to give you an idea of what's going on but it should be pretty self-explanatory.
# Assuming your dataframe looks like this
df <- data.frame(Date = c('3/1/2014','3/1/2014','4/2/2014','5/1/2014','5/1/2014','6/1/2014','6/1/2014'), Miles = c(72,14,131,534,123,43,56), Activity = c('Walking','Walking','Biking','Running','Running','Running', 'Biking'))
# Load lubridate and data.table
library(lubridate)
library(data.table)
# Convert dataframe to a data.table
setDT(df)
df[, Date := as.Date(Date, format = '%m/%d/%Y')] # Convert data to a column of Class Date -- check with class(df[, Date]) if you are unsure
df[, Date := floor_date(Date, unit = 'month')] # Reduce all dates to the first day of the month for summing later on
# Create ggplot object using data.tables functionality to sum the miles
ggplot(df[, sum(Miles), by = .(Date, Activity)], aes(x = Date, y = V1, colour = factor(Activity))) + # Data.table creates the column V1 which is the sum of miles
geom_line() +
scale_x_date(date_labels = '%b-%y') # %b is used to display the first 3 letters of the month
Upvotes: 2