Reputation: 489
I'm analysing app data using R and I find myself having to group by time a lot so I can plot it in ggplot, however this doesn't seem easy to do.
my data looks like:
user_id | session_id | timestamp | time_seconds
001 | 123 | 2014-01-01| 251
002 | 845 | 2014-01-01| 514
003 | 741 | 2014-01-02| 141
003 | 477 | 2014-01-03| 221
004 | 121 | 2014-01-03| 120
005 | 921 | 2014-01-04| 60
...
The time_stamp column is formatted with as.Date()
so it should be recognised as a date by R.
I need to plot line graphs showing no. of sessions over time in ggplot. Is there a simple way to do this within the ggplot code? for example:
ggplot(df, aes(timestamp,count(session_id)))+
geom_line()
I want to do a count of sessions per date, the above code doesn't work, just an example to show what I'm after.
What I'd also like to do is then summarise by month. I'd also like to look into specific months and would like to subset the data. Can this be done from that line of code? xlim
isn't what I'm after as that just "shortens" the axis.
I've tried using the aggregate
function but with mixed results, not really what I've been after.
Thanks.
Upvotes: 1
Views: 2892
Reputation: 7724
You can use group_by
and summarize
from the dplyr
-package:
library(dplyr)
library(ggplot2)
df %>%
group_by(timestamp) %>%
summarise(session_count = n()) %>%
ggplot(aes(timestamp, session_count)) +
geom_line()
For summarizing the data by month you can do:
df %>%
mutate(month_timestamp = format(timestamp, "%b %Y")) %>%
group_by(month_timestamp) %>%
summarise(session_count = n()) %>%
ggplot(aes(month_timestamp, session_count)) +
geom_line()
The plot here doesn't show something because there's only one month in your data.
Data
df <- structure(list(user_id = c("001", "002", "003", "003", "004", "005"),
session_id = c("123", "845", "741", "477", "121", "921"),
timestamp = structure(c(16071, 16071, 16072, 16073, 16073, 16074),
class = "Date"),
time_seconds = c(251, 514, 141, 221, 120, 60)),
.Names = c("user_id", "session_id", "timestamp", "time_seconds"),
class = c("tbl_df", "tbl", "data.frame"),
row.names = c(NA, -6L))
Upvotes: 1
Reputation: 5893
Might also be convenient to do with lubridate, e.g.
library(tidyverse)
dat <- data.frame(timestamp = rep(seq.Date(as.Date("2014/01/01"), as.Date("2014/12/24"), "day"), each = 2),
sessions = 1)
dat %>%
mutate(month = format(timestamp, "%Y-%m")) %>%
group_by(month) %>%
summarise(sum_session = sum(sessions)) %>%
ggplot(data = e, aes(x = month, y = sum_session, group = 1)) + geom_line()
Upvotes: 0