theredrabbit
theredrabbit

Reputation: 25

Is there a way to count repeated observations using the summarize function in R?

I'm working with a data set that contains CustomerID, Sales_Rep, Product, and year columns. The problem I have with this dataset is that there is no unique Transaction Number. The data looks like this:

CustomerID        Sales Rep          Product            Year
301978        Richard Grayson       Product A           2017
302151        Maurin Thompkins      Product B           2018
301962        Wallace West          Product C           2019
301978        Richard Grayson       Product B           2018
402152        Maurin Thompkins      Product A           2017
501967        Wallace West          Product B           2017
301978        Richard Grayson       Product B           2018

What I'm trying to do is count how many transactions were made by each Sales Rep, per year by counting the number of Customer IDs that appear for each Sales Rep per year regardless if the customer ID is repeated, and then compile it into one data frame called "Count". I tried using the following functions in R:

Count <- Sales_Data %>% 
  group_by(Sales_Rep, year) %>%
  summarize(count(CustomerID))

but I get this error:

Error: Problem with `summarise()` input `..1`.
i `..1 = count(PatientID)`.
x no applicable method for 'count' applied to an object of class "c('integer', 'numeric')"

The result I want to produce is this:

Sales Rep                2017          2018            2019
Richard Grayson            1             2            
Maurin Thompkins           1             1            
Wallace West               1                             1

Can anybody help me?

Upvotes: 2

Views: 182

Answers (1)

Rui Barradas
Rui Barradas

Reputation: 76565

There is no need to group and summarise, function count does that in one step. Then reshape to wide format.

Sales_Data <- read.table(text = "
CustomerID        'Sales Rep'          Product            Year
301978        'Richard Grayson'       'Product A'           2017
302151        'Maurin Thompkins'      'Product B'           2018
301962        'Wallace West'          'Product C'           2019
301978        'Richard Grayson'       'Product B'           2018
402152        'Maurin Thompkins'      'Product A'           2017
501967        'Wallace West'          'Product B'           2017
301978        'Richard Grayson'       'Product B'           2018
", header = TRUE, check.names = FALSE)

suppressPackageStartupMessages({
  library(dplyr)
  library(tidyr)
})

Sales_Data %>% count(CustomerID)
#>   CustomerID n
#> 1     301962 1
#> 2     301978 3
#> 3     302151 1
#> 4     402152 1
#> 5     501967 1

Sales_Data %>% 
  count(`Sales Rep`, Year) %>%
  pivot_wider(id_cols = `Sales Rep`, names_from = Year, values_from = n)
#> # A tibble: 3 x 4
#>   `Sales Rep`      `2017` `2018` `2019`
#>   <chr>             <int>  <int>  <int>
#> 1 Maurin Thompkins      1      1     NA
#> 2 Richard Grayson       1      2     NA
#> 3 Wallace West          1     NA      1

Created on 2022-04-03 by the reprex package (v2.0.1)


Edit

To have the output column 'Sales Rep' in the same order as in the input data, coerce to factor setting the levels attribute to that original order. This is taken care of by unique. After pivoting, 'Sales Rep' can be coerced back to character, if needed. I have omitted this final step in the code that follows.

Sales_Data %>% 
  mutate(`Sales Rep` = factor(`Sales Rep`, levels = unique(`Sales Rep`))) %>%
  count(`Sales Rep`, Year) %>%
  pivot_wider(id_cols = `Sales Rep`, names_from = Year, values_from = n)
#> # A tibble: 3 x 4
#>   `Sales Rep`      `2017` `2018` `2019`
#>   <fct>             <int>  <int>  <int>
#> 1 Richard Grayson       1      2     NA
#> 2 Maurin Thompkins      1      1     NA
#> 3 Wallace West          1     NA      1

Created on 2022-04-05 by the reprex package (v2.0.1)

Upvotes: 2

Related Questions