David
David

Reputation: 43

Finding the count until the first time a number occurs in a variable for all the same values of another variable

My data looks something like this. What i want to do now is create a new column "Acceptance Count" where i want to find the number of times 0 has appeared in the "Acceptance" column before "1" appears, for every same value of "Customer ID"

  Customer ID | Acceptance | Discount
       211         0          1
       211         0          0
       211         1          1
       202         1          1
       202         0          0
       194         0          0
       202         0          1
       194         1          0
       194         0          1
       198         0          1

Also i want to have another new column "Discount Count", where it should be 1 if the first time 1 appears in the "Acceptance" for every "Customer ID", the discount value is also 1. If other wise, "Discount Count" should be 0. If the "Acceptance" is 0 for all values of a "Customer ID", then the "Acceptance Count" and "Discount Count" should both be N/A

** For example, Customer ID 211 accepts in the third instance so "Acceptance Count" should read 2, and "Acceptance" reads 1 for the first time when the Discount is 1, so "Discount Count" should read 1, Customer ID 194 accepts in the second instance so "Acceptance Count" is 1, and it accepts when Discount is 0, so "Discount Count" is 0.

Expected result:

  Customer ID |  Acceptance Count | Discount Count
       211              2                  1
       202              0                  1
       194              1                  0
       198             N/A                N/A

I hope i am able to be clear about what i am looking for, thanks a lot for all of your help.

Also, i have 250,000 different Customer IDs and 750,000 in total data entries.

Upvotes: 1

Views: 419

Answers (2)

Evan Friedland
Evan Friedland

Reputation: 3194

Here is an attempt but I believe both 198 and 202 have all zeros for their Acceptance, meaning both Acceptance Counts and Discount Counts should be NA.

# Your data
df <- structure(list(Customer_ID = c(211, 211, 211, 202, 202, 194, 202, 194, 194, 198), 
                     Acceptance = c(0, 0, 1, 0, 0, 0, 0, 1, 0, 0), 
                     Discount = c(1, 0, 1, 1, 0, 0, 1, 0, 1, 1)), 
                .Names = c("Customer_ID", "Acceptance", "Discount"),
                row.names = c(NA, -10L), class = "data.frame")

# Desired output
df.split <- split(df, df$Customer_ID)
out <- t(sapply(df.split, function(x){ 
  a.1 <- which(x$Acceptance == 1) 
  d.1 <- which(x$Discount == 1)
  A_C <- ifelse(length(a.1) > 0, max(a.1) - length(a.1), 0)
  D_C <- ifelse(any(a.1[1] == d.1), 1, ifelse(sum(x$Acceptance) == 0, NA, 0))
  A_C <- ifelse(is.na(D_C), NA, A_C)
  setNames(c(A_C, D_C), c("Acceptance_Count","Discount_Count"))
}))

out <- cbind.data.frame(Customer_ID = as.numeric(rownames(out)), out)
out <- out[order(match(rownames(out), df$Customer_ID)),]
rownames(out) <- NULL
out
#  Customer_ID Acceptance_Count Discount_Count
#1         211                2              1
#2         202                0             1
#3         194                1              0
#4         198               NA             NA

EDIT: Condition Clarification.

-if Acceptance has a 1, Acceptance_Count = sum zeros before 1

Acceptance Discount
         0        0
         0        0
         1        0
Acceptance_Count Discount_Count
         2        0

-if Discount has a 1 that is on the same row as Acceptance's first 1 -> Dicount_Count = 1

Acceptance Discount
         0        0
         0        0
         1        1
Acceptance_Count Discount_Count
         2        1

-if Acceptance has all zeros -> Acceptance_Count & Discount Count = NA

Acceptance Discount
         0        0
         0        0
         0        0
Acceptance_Count Discount_Count
         NA        NA

-Other possibilities

Acceptance Discount
         1        0
         0        1
         1        0
Acceptance_Count Discount_Count
         1        0

Acceptance Discount
         0        0
         0        1
         0        1
Acceptance_Count Discount_Count
         NA        NA

Upvotes: 0

RoyalTS
RoyalTS

Reputation: 10203

Somewhat hacky, but here's a solution that I think does what you want:

df %>% 
  group_by(customer_id) %>% 
  summarize(acceptance_count = ifelse(rle(Acceptance)$values[1] == 0 & rle(Acceptance)$values[2] == 1, rle(    Acceptance)$lengths[1], NA),
            discount_count = as.integer(Discount[min(which(Acceptance == 1))] == 1))

Upvotes: 0

Related Questions