Reputation: 65
I have a B2B customer dataset where we want to be able to measure the onboarding process of our customers after giving them access to our webshop. A company can have many users that has been given access. I would like to create another column called "Onboarding" with conditional to "First log-in date" that if a user from any given company has logged in for the first time, then we would classify this company or customer as onboarded with the value "Yes" otherwise "No". And this # means they have not logged in yet. I am unsure how to approach this in R. Can any help me please? ^^
An example is attached as a picture:
Upvotes: 1
Views: 1065
Reputation: 5647
Is that what you wanted...
First data for reproducible report:
dat <- data.frame(Company = c("A","A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"),
UserID = c("Simon","Hans","Jane","Alex","David","Dan","Sarah","Susan","Bob","Keith",
"Harry","Adam","Kenneth","Denial","Henna","John","Dylan"),
First_Log_in_Date = c("2018-02-22","#","2018-03-07","2018-04-29","#","#","#",
"2018-05-01","2018-02-27","2018-06-08","2018-07-12",
"2018-02-21","#","#","#","#","#"),
stringsAsFactors = F)
To answer your original question, I would simply use base ifelse() function:
dat$Onboarding <- ifelse(dat$First_Log_in_Date=="#", "NO", "YES")
And we get the result "Onboarding" column filled with Yes or No, depending upon the log in date.
To answer your condition based 2nd question, I would simply use "dplyr" package functions:
dat <- dat %>% group_by(Company) %>%
mutate(onborded = ifelse(n_distinct(First_Log_in_Date) > 1, "Yes", "No"))
We get the result "Onboarding" column filled with Yes or No, depending upon the log in date for employees, in any of the group company, is other than just "#".
The table will look like:
Upvotes: 1
Reputation: 486
In other words, for a given company if all users have First-Login Date as "#", the company has not been onboarded. Is that correct?
You can use the split-apply-combine approach for such problems:
#### Data ####
my_df <- data.frame(Company = c("A","A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"),
UserID = c("Simon","Hans","Jane","Alex","David","Dan","Sarah","Susan","Bob","Keith",
"Harry","Adam","Kenneth","Denial","Henna","John","Dylan"),
First_Log_in_Date = c("2018-02-22","#","2018-03-07","2018-04-29","#","#","#",
"2018-05-01","2018-02-27","2018-06-08","2018-07-12",
"2018-02-21","#","#","#","#","#"))
#### Split - Apply - Combine ####
my_df %>% split(., .$Company) %>% lapply(function(company_df) {
# "Check if any user logged in
if(any(company_df$First_Log_in_Date != "#")) {
company_df$onboarded <- T
return(company_df)
}
company_df$onboarded <- F
return(company_df)
}) %>% do.call(rbind, .)
Upvotes: 0