Alex
Alex

Reputation: 313

Using scale_x_date in ggplot2 with different columns

Say I have the following data:

Date      Month  Year  Miles Activity
3/1/2014    3     2014  72   Walking
3/1/2014    3     2014  85   Running
3/2/2014    3     2014  42   Running
4/1/2014    4     2014  65   Biking
1/1/2015    1     2015  21   Walking
1/2/2015    1     2015  32   Running

I want to make graphs that display the sum of each month's date for miles, grouped and colored by year. I know that I can make a separate data frame with the sum of the miles per month per activity, but the issue is in displaying. Here in Excel is basically what I want--the sums displayed chronologically and colored by activity. enter image description here

I know ggplot2 has a scale_x_date command, but I run into issues on "both sides" of the problem--if I use the Date column as my X variable, they're not summed. But if I sum my data how I want it in a separate data frame (i.e., where every activity for every month has just one row), I can't use both Month and Year as my x-axis--at least, not in any way that I can get scale_x_date to understand.

(And, I know, if Excel is graphing it correctly why not just use Excel--unfortunately, my data is so large that Excel was running very slowly and it's not feasible to keep using it.) Any ideas?

Upvotes: 1

Views: 607

Answers (1)

GarAust89
GarAust89

Reputation: 457

The below worked fine for me with the small dataset. If you convert you data.frame to a data.table you can sum the data up to the mile per activity and month level with just a couple preprocessing steps. I've left some comments in the code to give you an idea of what's going on but it should be pretty self-explanatory.

 # Assuming your dataframe looks like this
 df <- data.frame(Date = c('3/1/2014','3/1/2014','4/2/2014','5/1/2014','5/1/2014','6/1/2014','6/1/2014'), Miles = c(72,14,131,534,123,43,56), Activity = c('Walking','Walking','Biking','Running','Running','Running', 'Biking'))

 # Load lubridate and data.table
library(lubridate)
library(data.table)

# Convert dataframe to a data.table
setDT(df)
df[, Date := as.Date(Date, format = '%m/%d/%Y')] # Convert data to a column of Class Date -- check with class(df[, Date]) if you are unsure
df[, Date := floor_date(Date, unit = 'month')] # Reduce all dates to the first day of the month for summing later on

# Create ggplot object using data.tables functionality to sum the miles 
ggplot(df[, sum(Miles), by = .(Date, Activity)], aes(x = Date, y = V1, colour = factor(Activity))) + # Data.table creates the column V1 which is the sum of miles
  geom_line() +
  scale_x_date(date_labels = '%b-%y') # %b is used to display the first 3 letters of the month

Upvotes: 2

Related Questions