Alina
Alina

Reputation: 23

Count variable values using a unique key of multiple variables in R

I have a data frame with the variables STORE, SALES_DT, REGISTER, TRANS_ID, and PRODUCT.

Each unique combination of STORE, SALES_DT, REGISTER, and TRANS_ID represents a unique transaction, not just the TRANS_ID. For example, there could be a transaction with the same store, date, and transaction id, and product but at a different register. Any combination is possible. A very small portion of the data frame could be...

STORE   SALES_DT      REGISTER    TRANS_ID    PRODUCT
1       2017-04-12    3           1234        Milk
1       2017-04-12    3           1234        Milk
1       2014-06-01    14          8901        Eggs
23      2014-06-09    1           4597        Eggs
48      2016-01-25    2           1234        Bread
48      2015-12-09    2           8901        Milk

How do I make a count of unique transactions for each PRODUCT that would output something like this?

PRODUCT    
Milk     :2
Eggs     :2
Bread    :1

I have tried:

cart <- group_by(dataframe, STORE, SLS_DT, REGISTER, TRANS_ID)
summary(cart$PRODUCT)

but it seems that it is ignoring the group_by in the count since it outputs:

PRODUCT
MILK    :3
EGGS    :2
BREAD   :1

Upvotes: 1

Views: 154

Answers (1)

MrFlick
MrFlick

Reputation: 206177

Use n_distinct to find the number of uniquie transactions

dataframe %>% group_by(PRODUCT) %>% 
  summarize(n=n_distinct(TRANS_ID))

Upvotes: 1

Related Questions