Reputation: 297
I am new to R and I have problems with calculating the amount of bill for each month. I have the dataframe as below:
dat <- data.frame(
time = factor(c("Breakfast","Breakfast","Breakfast","Breakfast","Breakfast","Breakfast"), levels=c("Breakfast")), date=c("2020-01-20","2020-01-21","2020-01-22","2020-02-10","2020-02-11","2020-02-12"),
total_bill = c(12.7557,14.8,17.23,15.7,16.9,13.2)
)
My goal is to calculate the amount spending on the Breakfast
for each month so here we have two months and I want to get the total sum of January and February separately.
Any help for this would be much appreciated. Thank you!
Upvotes: 0
Views: 484
Reputation: 21440
Does this answer your question?
sums <- tapply(dat$total_bill, format(as.Date(dat$date), "%B"), sum)
February January
45.8000 44.7857
sums
is a list: so if you want to access, for example, the datum for February, you can do this:
sums[1]
February
45.8
Alternatively, you can convert sums
into a dataframe and access the monthly sums via the month names:
sums <- as.data.frame.list(tapply(dat$total_bill, format(as.Date(dat$date), "%B"), sum))
sums$February
45.8
Addition:
Another (fun) solution is via regex: you define the dates as a pattern and, using sub
plus backreference \\1
to recall the two numbers between the dashes, reduce them to the months part:
tapply(dat$total_bill, sub("\\d{4}-(\\d{2})-\\d{2}", "\\1", dat$date), sum)
01 02
44.7857 45.8000
Upvotes: 2
Reputation: 887891
We can convert the 'date' to Date
class, get the month
, and use that as grouping column and sum
the 'total_bill'
library(dplyr)
dat %>%
group_by(time, Month = format(as.Date(date), "%B")) %>%
summarise(total_bill = sum(total_bill, na.rm = TRUE))
# A tibble: 2 x 3
# Groups: time [1]
# time Month total_bill
# <fct> <chr> <dbl>
#1 Breakfast February 45.8
#2 Breakfast January 44.8
We can convert it to 'wide' format, if that is needed
library(tidyr)
out <- dat %>%
group_by(time, Month = format(as.Date(date), "%B")) %>%
summarise(total_bill = sum(total_bill, na.rm = TRUE)) %>%
pivot_wider(names_from = Month, values_from = total_bill)
out
# A tibble: 1 x 3
# Groups: time [1]
# time February January
# <fct> <dbl> <dbl>
# 1 Breakfast 45.8 44.8
If we also need to group by 'year'
out <- dat %>%
mutate(date = as.Date(date)) %>%
group_by(time, Year = format(date, "%Y"), Month = format(date, "%B")) %>%
summarise(total_bill = sum(total_bill, na.rm = TRUE))
Upvotes: 1
Reputation: 2435
library(dplyr)
d_sum <- dat %>%
group_by(substr(date, 0, 7)) %>%
summarise(sum = sum(total_bill))
d_sum
# A tibble: 2 x 2
`substr(date, 0, 7)` sum
<chr> <dbl>
1 2020-01 44.8
2 2020-02 45.8
Upvotes: 0