easymoneysniper
easymoneysniper

Reputation: 35

Count the number of records per month in R

Given Merchant ID, Card Number and Date, I need to count the number of Card Numbers associated to each Merchant ID per month. So far I can only count the number of Card Numbers per Merchant ID but I cannot count it yet per Month using this code:

HitsCounter <- summarise(group_by(df, MerchantID), HitsTotal = n_distinct(CardNum))

Reproducible Data:

MerchantID <- c('1234, '1234', '1234', '1234')
CardNum <- c('1abc1', '1abc1', '2xyz2', '3ijk3')
Date <- c('2020-05-07', '2020-05-07', '2019-10-12', '2019-10-25')
df <- data.frame(MerchantID, CardNum, Date)

What I get when I merge HitsCounter to my dataframe df:

MerchantID  CardNum     Date        HitsTotal
  1234       1abc1   2020-05-07         3
  1234       1abc1   2020-05-07         3
  1234       2xyz2   2019-10-12         3
  1234       3ijk3   2019-10-25         3
 

Expected Output Data:

MerchantID  CardNum     Date        HitsPerMonth
  1234       1abc1   2020-05-07          2
  1234       1abc1   2020-02-07          1
  1234       2xyz2   2020-05-12          2
  1234       3ijk3   2019-10-25          1
  1234       3ijk3   2019-10-01          1      

Take note that the logic should completely disregard any duplicate CardNumber even with a different month since a "Hit" is counted per unique CardNumber.

To put it simply, it should answer the question:

How many Unique Card Numbers transacted with the same Merchant ID in the same month?

Upvotes: 1

Views: 1877

Answers (2)

jay.sf
jay.sf

Reputation: 72603

You may use ave and calculate the unique lengths.

df <- transform(df, hits.pm=ave(CardNum, MerchantID, substr(Date, 6, 7), 
                                FUN=function(x) length(unique(x))))
df
#   MerchantID CardNum       Date hits.pm
# 1       1234   1abc1 2020-05-07       1
# 2       1234   1abc1 2020-05-07       1
# 3       1234   2xyz2 2019-10-12       2
# 4       1234   3ijk3 2019-10-25       2
# 5       1235   1abc1 2020-05-07       1
# 6       1236   1abc1 2020-05-07       1
# 7       1235   2xyz2 2019-10-12       2
# 8       1235   3ijk3 2019-10-25       2

Data:

df <- read.table(header=T, text="  MerchantID CardNum       Date
1       1234   1abc1 2020-05-07
2       1234   1abc1 2020-05-07
3       1234   2xyz2 2019-10-12
4       1234   3ijk3 2019-10-25
5       1235   43bc1 2020-05-07
6       1235   foo12 2020-05-07
7       1236   foo34 2019-10-12
8       1236   foo34 2019-10-25
                 ")

Upvotes: 0

Ronak Shah
Ronak Shah

Reputation: 388807

You can extract year-month value from date and count the unique CardNum values for each MerchantID and month.

library(dplyr)

df %>%
  mutate(Date = as.Date(Date), 
         month = format(Date, "%Y-%m")) %>%
  group_by(MerchantID, month) %>%
  mutate(HitsPerMonth = n_distinct(CardNum))

If a CardNum has to be calculated only once for each MerchantID irrespective of the month then you can do :

df %>%
  mutate(Date = as.Date(Date), 
         month = format(Date, "%Y-%m")) %>%
  group_by(MerchantID) %>%
  mutate(CardNum = replace(CardNum, duplicated(CardNum), NA)) %>%
  group_by(month, .add  =TRUE) %>%
  mutate(HitsPerMonth = n_distinct(na.omit(CardNum)))

Upvotes: 2

Related Questions