Reputation: 1
I'm working on aggregating raw data to show historical sales to all customers over the past 10 years. Each row includes the (1) company name, (2) part number, (3), year sale was made, (4) quantity. Sales to some of our top customers include many rows with the details listed above for each invoice.
Company_1 2013 part_number_1 quantity_1
Company_1 2013 part_number_2 quantity_2
Company_1 2015 part_number_1 quantity_3
Company_2 2013 part_number_3 quantity_4
Company_2 2016 part_number_4 quantity_5
I'm looking to consolidate the data with company names in the first column and years 2013 - 2023 for the remaining columns. If sales to a particular company takes up 150 rows in the raw data, these should be aggregated to 1 row, breaking down the sales quantity per year.
| Company Name | 2013 | 2014 | 2015 |
| -------- | -------- | -------- | -------- |
| Company 1 | Quantity | Quantity | Quantity |
| Company 2 | Quantity | Quantity | Quantity |...
sales_data_clean <- sales_data_raw %>%
rename(part_number = Base,
year = Year,
company_name = `Company Name`,
quantity = Quantity)
sales_data_grouped <- sales_data_clean %>%
select(company_name, part_number, year, quantity)
I wrote the above code, but can't figure out how to combine data for similar companies and create new columns for years 2013 - 2023 with quantities sold in the table.
Any help would be greatly appreciated.
Updated code, but still showing 1's instead of 0's.
library(readxl)
library(tidyverse)
sales_data_raw <- data.frame(
year = c(2010, 2010, 2011, 2012, 2016, 2016, 2017, 2019),
company_name = c("Company A", "Company B", "Company B", "Company C", "Company D", "Company E", "Company E", "Company E"),
part_number = c("3200", "619", "619", "LR20", "O8M", "BA-10", "BA-10", "BA-10"),
quantity = c(650, 69000, 31000, 500, 1000, 402, 1768, 6098)
)
sales_data_grouped <- sales_data_raw %>%
filter(year%in%2010:2023) %>%
select(company_name, year, quantity) %>%
group_by(company_name, year) %>%
summarize(total=sum(quantity,
na.rm = T)) %>%
spread(year, total, fill = T)
Upvotes: 0
Views: 64
Reputation: 23
However you didn't provide an example data but something like this should do the job:
library(tidyr)
raw <- data.frame(company_name=sample(c("A","B","C","D"), 150, replace = T),
part_number=sample(c("p1","p2","p3","p4"), 150, replace = T),
year=sample(2010:2024, 150, replace = T),
quantity=sample(1:10, 150, replace = T))
agr <- filter(raw, year%in%2013:2023) %>% select(company_name, year, quantity) %>%
group_by(company_name, year) %>% summarize(total=sum(quantity, na.rm = T)) %>% spread(year, total, fill = T)
agr
# # A tibble: 4 x 12
# # Groups: company_name [4]
# company_name `2013` `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022` `2023`
# <chr> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int> <int>
# 1 A 18 6 18 1 5 2 6 17 15 2 2
# 2 B 9 28 18 32 11 33 9 7 8 9 15
# 3 C 7 16 17 11 22 9 7 20 31 21 17
# 4 D 18 3 21 20 18 16 1 13 14 35 10
Upvotes: 0
Reputation: 2132
First of all, read this and always include sample or toy data in your questions: you'll get better and faster help.
If you only want to summarize the annual sales (quantity
) of each company, without taking into account the parts (part_number
), try this:
(PS. Toy data at the end)
Code:
library(tidyverse)
toy_sales %>%
summarise(.by = c(company_name, year), quantity = sum(quantity)) %>%
pivot_wider(names_from = year, values_from = quantity, names_prefix = "y_")
Output:
# A tibble: 3 × 7
company_name y_2010 y_2011 y_2012 y_2013 y_2014 y_2015
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 John_Beatles 5075 3875 2700 3975 4000 3250
2 Mick_Stones 1250 4025 4475 3200 2975 3450
3 Paul_Beatles 3225 3875 2075 3925 3650 3700
References:
dplyr::summarise
: link
dplyr::pivot_wider
: link
Toy data:
set.seed(123)
sample_x <- \(x) sample(x, 250, replace = TRUE)
toy_sales <- tibble(
company_name = sample_x(paste0(band_members$name, "_", band_members$band)),
year = sample_x(2010:2015),
part_number = sample_x(1:10),
quantity = sample_x(seq.int(0, 500, by = 25))
) %>%
arrange(across(everything()))
Upvotes: 0