Jamie
Jamie

Reputation: 553

Finding mean of variable across each month/year

I have a dataset that looks similar to this:

> dput(df)
structure(list(Date = c("3/23/21", "4/11/22", "6/30/22"), Banana_wasted = c(4L, 
2L, 5L), Apple_wasted = c(6L, 0L, 3L), Orange_wasted = c(1L, 
4L, 1L), Banana_ordered = c(5L, 7L, 7L), Apple_Ordered = c(9L, 
8L, 9L), Orange_ordered = c(5L, 6L, 6L), Banana_eaten = c(5L, 
5L, 6L), Apple_eaten = c(7L, 7L, 4L), Orange_eaten = c(8L, 8L, 
8L)), class = "data.frame", row.names = c(NA, -3L))

I want to find the % of fruit wasted per month/year (in relation to how many fruits were ordered). it should be: (Banana_wasted+Apple_wasted+Orange_wasted) / (Banana_ordered + Apple_ordered+ Orange_ordered)

So, for 3/21, it should be: (4+6+1/5+9+5)*100 = 57.9%

I would like to do this for every month of the year.

Upvotes: 0

Views: 56

Answers (3)

zephryl
zephryl

Reputation: 17254

Pivot longer to get single wasted and ordered columns across all fruits; use lubridate::floor_date() and mdy() to get months from Date; group by month; then sum and divide to get your percentages:

library(dplyr)
library(tidyr)
library(lubridate)

dat %>% 
  rename(Apple_ordered = Apple_Ordered) %>% # for consistent capitalization
  pivot_longer(
    Banana_wasted:Orange_eaten,
    names_to = c("Fruit", ".value"),
    names_sep = "_"
  ) %>% 
  group_by(month = floor_date(mdy(Date), "month")) %>% 
  summarize(pct_wasted = sum(wasted) / sum(ordered)) %>% 
  ungroup()

# # A tibble: 3 × 2
#   month      pct_wasted
#   <date>          <dbl>
# 1 2021-03-01      0.579
# 2 2022-04-01      0.286
# 3 2022-06-01      0.409

If you prefer character labels, use strftime() instead of floor_date(), and scales::percent() for the percentages:

library(scales)

dat %>% 
  rename(Apple_ordered = Apple_Ordered) %>% 
  pivot_longer(
    Banana_wasted:Orange_eaten,
    names_to = c("Fruit", ".value"),
    names_sep = "_"
  ) %>% 
  group_by(month = strftime(mdy(Date), "%B %Y")) %>% 
  summarize(pct_wasted = percent(sum(wasted) / sum(ordered), accuracy = 0.1)) %>% 
  ungroup()

# # A tibble: 3 × 2
#   month      pct_wasted
#   <chr>      <chr>     
# 1 April 2022 28.6%     
# 2 June 2022  40.9%     
# 3 March 2021 57.9% 

Upvotes: 0

Ruam Pimentel
Ruam Pimentel

Reputation: 1329

library(dplyr)
library(lubridate)


df %>% 
  mutate(Date = as.Date(Date, format = "%m/%d/%y"),
         pct_wasted = (Banana_wasted + Apple_wasted + Orange_wasted) / (Banana_ordered + Apple_Ordered + Orange_ordered) * 100) %>% 
  group_by(year = year(Date), month = month(Date)) %>% 
  summarize(avg_pct_wasted = mean(pct_wasted))


#> # A tibble: 3 × 3
#> # Groups:   year [2]
#>    year month avg_pct_wasted
#>   <dbl> <dbl>          <dbl>
#> 1  2021     3           57.9
#> 2  2022     4           28.6
#> 3  2022     6           40.9

Created on 2023-02-06 with reprex v2.0.2

Upvotes: 1

HoelR
HoelR

Reputation: 6583

library(tidyverse)

df %>%
  group_by(Date = floor_date(mdy(Date), "month")) %>%
  summarise(
    wasted = sum(across(contains("wasted"))) / sum(across(contains("ordered"))),
    wasted_eaten = sum(across(contains("wasted"))) / sum(across(contains("eaten")))
  )

# A tibble: 3 x 3
  Date       wasted wasted_eaten
  <date>      <dbl>        <dbl>
1 2021-03-01  0.579        0.579
2 2022-04-01  0.286        0.314
3 2022-06-01  0.409        0.523

Upvotes: 2

Related Questions