Prasanna Nandakumar
Prasanna Nandakumar

Reputation: 4335

Count,Distinct and No repetition in R

I have the following data set

zz <- "Date Token
20170120    12073300000000000000
20170120    18732300000000000000
20170120    15562500000000000000
20170120    13959500000000000000
20170120    13959500000000000000
20170121    13932200000000000000
20170121    10589400000000000000
20170121    15562500000000000000
20170121    13959500000000000000
20170121    13959500000000000000
20170121    10589400000000000000"

Data <- read.table(text=zz, header = TRUE)

I am trying to get to below stats

Date       # of Transactions    Unique Token    New Token
20170120    5                    4                4
20170121    6                    4                3 

# of Transactions - Total Transactions (includes duplicate tokens)
unique Token - No duplicates
New Token - No repetition with other dates.

Edit1: New Token - On the first day - all unique token are new tokens. from the next day - need to compare each day unique card and see if it is repeated from the prev. day, if not repeated then its a new token for that day Edit2: Essentially i have 1 month range of data and i am trying to find for those 30 days - on each day what is the new Token . has there been an improvement in new token on daily basis.

Upvotes: 0

Views: 83

Answers (2)

FlorianGD
FlorianGD

Reputation: 2436

Here is a solution using dplyr and purrr. Note that I don't get the results you gave in your question, as you only have 2 unique new tokens for the second date

df <- Data %>% 
    group_by(Date) %>% 
    summarise(N_transac = n(), 
              unique_token = n_distinct(Token),
              tokens = list(Token)) %>%
    mutate(prev = lag(tokens, 1), 
           new = purrr::map2_int(tokens, prev, ~length(setdiff(.x, .y)))) %>%
    select(-tokens, -prev)
df
# A tibble: 2 <U+00D7> 4
      Date N_transac unique_token   new
     <int>     <int>        <int> <int>
1 20170120         5            4     4
2 20170121         6            4     2

Upvotes: 1

mt1022
mt1022

Reputation: 17289

I think this will give what you want:

Data %>%
    mutate(new.tk = !duplicated(Token)) %>%
    group_by(Date) %>%
    summarize(
        count = n(),
        unique = n_distinct(Token),
        new = ifelse(Date[1] == Data$Date[1],  sum(new.tk), sum(Token %in% Token[new.tk]))
)

# # A tibble: 2 × 4
#       Date count unique   new
#      <int> <int>  <int> <int>
# 1 20170120     5      4     4
# 2 20170121     6      4     3

Upvotes: 1

Related Questions