user205660
user205660

Reputation: 21

Regex on variable names in R (reformatting dataset)

I'm working with yelp dataset. Variable names are in the form "day.hour". So Fri.4 means Friday 4am and Fri.22 means Friday at 10pm, and the variable value is the number of checkins at that time.

I want to create a plot where I have 7 lines. Each line represents a day of the week and each line shows how the average checkins are trending by the hour of the day. So each line connects 24 points, and I have 7 lines.

Any help?

I would use dplyr but can't figure out how to get all the Monday variables together and the Tuesday variables together etc because the names are like Tue.01, Tues.02, etc.... How do I do operations on the strings?

How my current dataset is formatted: my dataset is 1x168 (each variable is the date.time)

Fri.0 114.35897
Sat.0 154.92308
Sun.0 153.96154
Wed.0 93.92308
Fri.1 124.29487
Sat.1 168.07692
Thu.1 105.96154
Wed.1 101.85897
Sat.2 175.00000
Sun.2 157.48718
Thu.2 105.97436
Wed.2 97.08974
Fri.3 108.46154
Sun.3 145.24359

enter image description here

Upvotes: 2

Views: 49

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76450

This can be done with a simple pipe directly into ggplot, there is no need to group_by the weekdays, the ggplot2 aesthetic aes(colour = .) will do the grouping.

library(dplyr)
library(stringr)
library(ggplot2)

dh %>%
  mutate(Weekday = str_extract(day.hour, "^[[:alpha:]]{3}"),
         Day = as.integer(str_extract(day.hour, "[[:digit:]]*$"))) %>%
  ggplot(aes(x = Day, y = value, colour = Weekday)) +
  geom_line()

enter image description here

Data.

dh <- read.table(text = "
Fri.0 114.35897
Sat.0 154.92308
Sun.0 153.96154
Wed.0 93.92308
Fri.1 124.29487
Sat.1 168.07692
Thu.1 105.96154
Wed.1 101.85897
Sat.2 175.00000
Sun.2 157.48718
Thu.2 105.97436
Wed.2 97.08974
Fri.3 108.46154
Sun.3 145.24359                 
")

names(dh) <- c("day.hour", "value")

Upvotes: 1

Related Questions