Reputation: 35
Given Merchant ID, Card Number and Date, I need to count the number of Card Numbers associated to each Merchant ID per month. So far I can only count the number of Card Numbers per Merchant ID but I cannot count it yet per Month using this code:
HitsCounter <- summarise(group_by(df, MerchantID), HitsTotal = n_distinct(CardNum))
Reproducible Data:
MerchantID <- c('1234, '1234', '1234', '1234')
CardNum <- c('1abc1', '1abc1', '2xyz2', '3ijk3')
Date <- c('2020-05-07', '2020-05-07', '2019-10-12', '2019-10-25')
df <- data.frame(MerchantID, CardNum, Date)
What I get when I merge HitsCounter to my dataframe df:
MerchantID CardNum Date HitsTotal
1234 1abc1 2020-05-07 3
1234 1abc1 2020-05-07 3
1234 2xyz2 2019-10-12 3
1234 3ijk3 2019-10-25 3
Expected Output Data:
MerchantID CardNum Date HitsPerMonth
1234 1abc1 2020-05-07 2
1234 1abc1 2020-02-07 1
1234 2xyz2 2020-05-12 2
1234 3ijk3 2019-10-25 1
1234 3ijk3 2019-10-01 1
Take note that the logic should completely disregard any duplicate CardNumber even with a different month since a "Hit" is counted per unique CardNumber.
To put it simply, it should answer the question:
How many Unique Card Numbers transacted with the same Merchant ID in the same month?
Upvotes: 1
Views: 1877
Reputation: 72603
You may use ave
and calculate the unique lengths.
df <- transform(df, hits.pm=ave(CardNum, MerchantID, substr(Date, 6, 7),
FUN=function(x) length(unique(x))))
df
# MerchantID CardNum Date hits.pm
# 1 1234 1abc1 2020-05-07 1
# 2 1234 1abc1 2020-05-07 1
# 3 1234 2xyz2 2019-10-12 2
# 4 1234 3ijk3 2019-10-25 2
# 5 1235 1abc1 2020-05-07 1
# 6 1236 1abc1 2020-05-07 1
# 7 1235 2xyz2 2019-10-12 2
# 8 1235 3ijk3 2019-10-25 2
Data:
df <- read.table(header=T, text=" MerchantID CardNum Date
1 1234 1abc1 2020-05-07
2 1234 1abc1 2020-05-07
3 1234 2xyz2 2019-10-12
4 1234 3ijk3 2019-10-25
5 1235 43bc1 2020-05-07
6 1235 foo12 2020-05-07
7 1236 foo34 2019-10-12
8 1236 foo34 2019-10-25
")
Upvotes: 0
Reputation: 388807
You can extract year-month value from date and count the unique CardNum
values for each MerchantID
and month
.
library(dplyr)
df %>%
mutate(Date = as.Date(Date),
month = format(Date, "%Y-%m")) %>%
group_by(MerchantID, month) %>%
mutate(HitsPerMonth = n_distinct(CardNum))
If a CardNum
has to be calculated only once for each MerchantID
irrespective of the month then you can do :
df %>%
mutate(Date = as.Date(Date),
month = format(Date, "%Y-%m")) %>%
group_by(MerchantID) %>%
mutate(CardNum = replace(CardNum, duplicated(CardNum), NA)) %>%
group_by(month, .add =TRUE) %>%
mutate(HitsPerMonth = n_distinct(na.omit(CardNum)))
Upvotes: 2