Reputation: 21
I'm working with yelp dataset. Variable names are in the form "day.hour". So Fri.4
means Friday 4am and Fri.22
means Friday at 10pm, and the variable value is the number of checkins at that time.
I want to create a plot where I have 7 lines. Each line represents a day of the week and each line shows how the average checkins are trending by the hour of the day. So each line connects 24 points, and I have 7 lines.
Any help?
I would use dplyr
but can't figure out how to get all the Monday variables together and the Tuesday variables together etc because the names are like Tue.01
, Tues.02
, etc.... How do I do operations on the strings?
How my current dataset is formatted: my dataset is 1x168 (each variable is the date.time)
Fri.0 114.35897
Sat.0 154.92308
Sun.0 153.96154
Wed.0 93.92308
Fri.1 124.29487
Sat.1 168.07692
Thu.1 105.96154
Wed.1 101.85897
Sat.2 175.00000
Sun.2 157.48718
Thu.2 105.97436
Wed.2 97.08974
Fri.3 108.46154
Sun.3 145.24359
Upvotes: 2
Views: 49
Reputation: 76450
This can be done with a simple pipe directly into ggplot
, there is no need to group_by
the weekdays, the ggplot2
aesthetic aes(colour = .)
will do the grouping.
library(dplyr)
library(stringr)
library(ggplot2)
dh %>%
mutate(Weekday = str_extract(day.hour, "^[[:alpha:]]{3}"),
Day = as.integer(str_extract(day.hour, "[[:digit:]]*$"))) %>%
ggplot(aes(x = Day, y = value, colour = Weekday)) +
geom_line()
Data.
dh <- read.table(text = "
Fri.0 114.35897
Sat.0 154.92308
Sun.0 153.96154
Wed.0 93.92308
Fri.1 124.29487
Sat.1 168.07692
Thu.1 105.96154
Wed.1 101.85897
Sat.2 175.00000
Sun.2 157.48718
Thu.2 105.97436
Wed.2 97.08974
Fri.3 108.46154
Sun.3 145.24359
")
names(dh) <- c("day.hour", "value")
Upvotes: 1