For those familiar, I'm working on the Coursera bike share case study . . . I'm working in R and have a data frame that is a log of all rides taken with a bicycle ride share company. A simplified version of the data frame is below. started_at ended_at weekday member_casual 2022-11-10 06:21:55 2022-11-10 06:31:27 Thursday member 2022-11-04 07:31:55 2022-11-04 07:46:25 Friday member 2022-11-21 17:20:29 2022-11-21 17:34:36 Monday casual 2022-11-25 17:29:34 2022-11-25 17:45:15 Friday member I'd like to create a data frame (or better yet a visual - maybe a bar chart?) that shows share of rides by day. So for example: weekday member casual Monday 12% 6% Tuesday 15% 9% Wednesday 15% 10% Thursday 13% 14% Friday 14% 18% Saturday 14% 20% Sunday 17% 23% What would be the easiest way to accomplish this? Thanks! I tried creating a data frame but wasn't sure sure how to get percent share of whole by member_casual weekday_by_member <- p12m %>% filter(member_casual=="member") %>% count(weekday) sorted_weekday_by_member <- arrange(weekday_by_member,levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday","Saturday")) sorted_weekday_by_member weekday_by_casual <- p12m %>% filter(member_casual=="casual") %>% count(weekday) sorted_weekday_by_casual <- arrange(weekday_by_casual,levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday","Saturday")) sorted_weekday_by_casual weekday_mem_cas <- p12m %>% group_by (member_casual) %>% count(weekday) weekday_mem_cas

Reputation: 13

R - Share of Records by Day of Week?

For those familiar, I'm working on the Coursera bike share case study . . .

I'm working in R and have a data frame that is a log of all rides taken with a bicycle ride share company. A simplified version of the data frame is below.

started_at	ended_at	weekday	member_casual
2022-11-10 06:21:55	2022-11-10 06:31:27	Thursday	member
2022-11-04 07:31:55	2022-11-04 07:46:25	Friday	member
2022-11-21 17:20:29	2022-11-21 17:34:36	Monday	casual
2022-11-25 17:29:34	2022-11-25 17:45:15	Friday	member

I'd like to create a data frame (or better yet a visual - maybe a bar chart?) that shows share of rides by day. So for example:

weekday	member	casual
Monday	12%	6%
Tuesday	15%	9%
Wednesday	15%	10%
Thursday	13%	14%
Friday	14%	18%
Saturday	14%	20%
Sunday	17%	23%

What would be the easiest way to accomplish this?

Thanks!

I tried creating a data frame but wasn't sure sure how to get percent share of whole by member_casual

weekday_by_member <- p12m %>%
    filter(member_casual=="member") %>%
    count(weekday) 
sorted_weekday_by_member <- arrange(weekday_by_member,levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday","Saturday"))

sorted_weekday_by_member


weekday_by_casual <- p12m %>%
    filter(member_casual=="casual") %>%
    count(weekday) 
sorted_weekday_by_casual <- arrange(weekday_by_casual,levels = c("Sunday", "Monday", "Tuesday", "Wednesday", "Thursday", "Friday","Saturday"))

sorted_weekday_by_casual


weekday_mem_cas <- p12m %>%
    group_by (member_casual) %>%
    count(weekday) 

weekday_mem_cas

Upvotes: 0

Answers (3)

AnilGoyal

Reputation: 26218

Do this. Needless to say I am using it on one dataset only


df %>% 
  group_by(weekday = lubridate::wday(starttime, label = T), usertype) %>% 
  summarise(trips = n(), .groups = 'drop') %>% 
  mutate(percent = scales::percent(trips/sum(trips)), .by = usertype) %>% 
  ggplot(aes(x = weekday, y = trips, fill = usertype)) +
  geom_col(position = "dodge") +
  geom_text(aes(label = percent), position = position_dodge(width = 0.9), vjust = -0.9)



df %>% 
  group_by(weekday = lubridate::wday(starttime, label = T), usertype) %>% 
  summarise(trips = n(), .groups = 'drop') %>% 
  mutate(percent = trips/sum(trips), .by = usertype) %>% 
  ggplot(aes(x = weekday, y = percent, fill = usertype)) +
  geom_col(position = "dodge") +
  geom_text(aes(label = scales::percent(percent)), position = position_dodge(width = 0.9), vjust = -0.9) +
  scale_y_continuous(labels = scales::percent)

^{Created on 2024-02-05 with reprex v2.0.2}

Upvotes: 0

Hoel

Reputation: 729

library(tidyverse)
library(scales)

df <- read_csv("202004-divvy-tripdata.csv") 

df %>% 
  mutate(across(started_at, ~ lubridate::wday(.x, 
                                              label = T, 
                                              week_start = 1, 
                                              abbr = FALSE))) %>% 
  ggplot() + 
  aes(x = started_at, fill = member_casual) + 
  geom_bar(position = position_dodge()) +
  geom_text(aes(label = percent(after_stat(proportions(count)))),
            stat = "count", 
            position = position_dodge(width = 0.9), 
            vjust = -1)

Upvotes: 0

jay.sf

Reputation: 72883

You can do this quite concisely using xtabs, barplot, and proportions for the percent labels.

> xtb <- xtabs(~ member_casual + weekday, dat)
> prp <- paste0(proportions(xtb)*100, '%')
> 
> b <- xtb |> 
+   barplot(beside=TRUE, col=c(4, 2), ylim=c(0, max(xtb) + 2), leg=rownames(xtb))
> text(b, xtb + 1, labels=prp, cex=.8)

Data:

> set.seed(42)
> n <- 100
> s <- sample(seq.POSIXt(as.POSIXct('2022-01-01'), as.POSIXct('2022-12-31'), 'secs'), n)
> dat <- data.frame(
+   started_at=s,
+   ended_at=s + sample.int(600, length(s))
+ ) |> 
+   transform(weekday=strftime(dat$started_at, '%A'),
+             member_casual=c('member', 'casual'))

Upvotes: 0

R - Share of Records by Day of Week?

Answers (3)

Related Questions