KalCapone
KalCapone

Reputation: 1

How to consolidate rows with similar names and create columns for different years?

I'm working on aggregating raw data to show historical sales to all customers over the past 10 years. Each row includes the (1) company name, (2) part number, (3), year sale was made, (4) quantity. Sales to some of our top customers include many rows with the details listed above for each invoice.

    Company_1      2013     part_number_1      quantity_1
    Company_1      2013     part_number_2      quantity_2
    Company_1      2015     part_number_1      quantity_3
    Company_2      2013     part_number_3      quantity_4
    Company_2      2016     part_number_4      quantity_5

I'm looking to consolidate the data with company names in the first column and years 2013 - 2023 for the remaining columns. If sales to a particular company takes up 150 rows in the raw data, these should be aggregated to 1 row, breaking down the sales quantity per year.

| Company Name | 2013     | 2014     | 2015     |
| --------     | -------- | -------- | -------- |
| Company 1    | Quantity | Quantity | Quantity |
| Company 2    | Quantity | Quantity | Quantity |...

sales_data_clean <- sales_data_raw %>% 
  rename(part_number = Base,
         year = Year,
         company_name = `Company Name`,
         quantity = Quantity)

sales_data_grouped <- sales_data_clean %>% 
  select(company_name, part_number, year, quantity)

I wrote the above code, but can't figure out how to combine data for similar companies and create new columns for years 2013 - 2023 with quantities sold in the table.

Any help would be greatly appreciated.


Updated code, but still showing 1's instead of 0's.

library(readxl)
library(tidyverse)

sales_data_raw <- data.frame(
  year = c(2010, 2010, 2011, 2012, 2016, 2016, 2017, 2019),
  company_name = c("Company A", "Company B", "Company B", "Company C", "Company D", "Company E", "Company E", "Company E"),
  part_number = c("3200", "619", "619", "LR20", "O8M", "BA-10", "BA-10", "BA-10"),
  quantity = c(650, 69000, 31000, 500, 1000, 402, 1768, 6098)
)

sales_data_grouped <- sales_data_raw %>% 
  filter(year%in%2010:2023) %>% 
  select(company_name, year, quantity) %>% 
  group_by(company_name, year) %>% 
  summarize(total=sum(quantity, 
  na.rm = T)) %>% 
  spread(year, total, fill = T)

Upvotes: 0

Views: 64

Answers (2)

Student
Student

Reputation: 23

However you didn't provide an example data but something like this should do the job:

library(tidyr)
raw <- data.frame(company_name=sample(c("A","B","C","D"), 150, replace = T), 
                  part_number=sample(c("p1","p2","p3","p4"), 150, replace = T),
                  year=sample(2010:2024, 150, replace = T),
                  quantity=sample(1:10, 150, replace = T))

agr <- filter(raw, year%in%2013:2023) %>% select(company_name, year, quantity) %>% 
  group_by(company_name, year) %>% summarize(total=sum(quantity, na.rm = T)) %>% spread(year, total, fill = T)

agr
# # A tibble: 4 x 12
# # Groups:   company_name [4]
#   company_name `2013` `2014` `2015` `2016` `2017` `2018` `2019` `2020` `2021` `2022` `2023`
#   <chr>         <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>  <int>
# 1 A                18      6     18      1      5      2      6     17     15      2      2
# 2 B                 9     28     18     32     11     33      9      7      8      9     15
# 3 C                 7     16     17     11     22      9      7     20     31     21     17
# 4 D                18      3     21     20     18     16      1     13     14     35     10

Upvotes: 0

Adriano Mello
Adriano Mello

Reputation: 2132

First of all, read this and always include sample or toy data in your questions: you'll get better and faster help.

If you only want to summarize the annual sales (quantity) of each company, without taking into account the parts (part_number), try this:
(PS. Toy data at the end)

Code:

library(tidyverse)

toy_sales %>% 
  summarise(.by = c(company_name, year), quantity = sum(quantity)) %>% 
  pivot_wider(names_from = year, values_from = quantity, names_prefix = "y_")

Output:

# A tibble: 3 × 7
  company_name y_2010 y_2011 y_2012 y_2013 y_2014 y_2015
  <chr>         <dbl>  <dbl>  <dbl>  <dbl>  <dbl>  <dbl>
1 John_Beatles   5075   3875   2700   3975   4000   3250
2 Mick_Stones    1250   4025   4475   3200   2975   3450
3 Paul_Beatles   3225   3875   2075   3925   3650   3700

References:
dplyr::summarise: link
dplyr::pivot_wider: link

Toy data:

set.seed(123)
sample_x <- \(x) sample(x, 250, replace = TRUE)

toy_sales <- tibble(
  company_name = sample_x(paste0(band_members$name, "_", band_members$band)),
  year = sample_x(2010:2015),
  part_number = sample_x(1:10),
  quantity = sample_x(seq.int(0, 500, by = 25))
) %>% 
  
  arrange(across(everything()))

Upvotes: 0

Related Questions