Reputation: 4335
I have the following data set
zz <- "Date Token
20170120 12073300000000000000
20170120 18732300000000000000
20170120 15562500000000000000
20170120 13959500000000000000
20170120 13959500000000000000
20170121 13932200000000000000
20170121 10589400000000000000
20170121 15562500000000000000
20170121 13959500000000000000
20170121 13959500000000000000
20170121 10589400000000000000"
Data <- read.table(text=zz, header = TRUE)
I am trying to get to below stats
Date # of Transactions Unique Token New Token
20170120 5 4 4
20170121 6 4 3
# of Transactions - Total Transactions (includes duplicate tokens)
unique Token - No duplicates
New Token - No repetition with other dates.
Edit1: New Token - On the first day - all unique token are new tokens. from the next day - need to compare each day unique card and see if it is repeated from the prev. day, if not repeated then its a new token for that day Edit2: Essentially i have 1 month range of data and i am trying to find for those 30 days - on each day what is the new Token . has there been an improvement in new token on daily basis.
Upvotes: 0
Views: 83
Reputation: 2436
Here is a solution using dplyr
and purrr
. Note that I don't get the results you gave in your question, as you only have 2 unique new tokens for the second date
df <- Data %>%
group_by(Date) %>%
summarise(N_transac = n(),
unique_token = n_distinct(Token),
tokens = list(Token)) %>%
mutate(prev = lag(tokens, 1),
new = purrr::map2_int(tokens, prev, ~length(setdiff(.x, .y)))) %>%
select(-tokens, -prev)
df
# A tibble: 2 <U+00D7> 4
Date N_transac unique_token new
<int> <int> <int> <int>
1 20170120 5 4 4
2 20170121 6 4 2
Upvotes: 1
Reputation: 17289
I think this will give what you want:
Data %>%
mutate(new.tk = !duplicated(Token)) %>%
group_by(Date) %>%
summarize(
count = n(),
unique = n_distinct(Token),
new = ifelse(Date[1] == Data$Date[1], sum(new.tk), sum(Token %in% Token[new.tk]))
)
# # A tibble: 2 × 4
# Date count unique new
# <int> <int> <int> <int>
# 1 20170120 5 4 4
# 2 20170121 6 4 3
Upvotes: 1