Create new column based on conditions in another columns

I have a B2B customer dataset where we want to be able to measure the onboarding process of our customers after giving them access to our webshop. A company can have many users that has been given access. I would like to create another column called "Onboarding" with conditional to "First log-in date" that if a user from any given company has logged in for the first time, then we would classify this company or customer as onboarded with the value "Yes" otherwise "No". And this # means they have not logged in yet. I am unsure how to approach this in R. Can any help me please? ^^

An example is attached as a picture:

data frame

data frame with new column

Upvotes: 1

Views: 1065

Answers (2)

Manoj Kumar
Manoj Kumar

Reputation: 5647

Is that what you wanted...

First data for reproducible report:

 dat <- data.frame(Company  = c("A","A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"),
                   UserID    = c("Simon","Hans","Jane","Alex","David","Dan","Sarah","Susan","Bob","Keith",
                              "Harry","Adam","Kenneth","Denial","Henna","John","Dylan"),
                   First_Log_in_Date = c("2018-02-22","#","2018-03-07","2018-04-29","#","#","#",
                                                    "2018-05-01","2018-02-27","2018-06-08","2018-07-12",
                                                    "2018-02-21","#","#","#","#","#"), 
                   stringsAsFactors = F)

To answer your original question, I would simply use base ifelse() function:

dat$Onboarding <- ifelse(dat$First_Log_in_Date=="#", "NO", "YES")

And we get the result "Onboarding" column filled with Yes or No, depending upon the log in date.

To answer your condition based 2nd question, I would simply use "dplyr" package functions:

dat <- dat %>% group_by(Company) %>% 
               mutate(onborded = ifelse(n_distinct(First_Log_in_Date) > 1, "Yes", "No"))

We get the result "Onboarding" column filled with Yes or No, depending upon the log in date for employees, in any of the group company, is other than just "#".

The table will look like:

enter image description here

Upvotes: 1

Chaos
Chaos

Reputation: 486

In other words, for a given company if all users have First-Login Date as "#", the company has not been onboarded. Is that correct?

You can use the split-apply-combine approach for such problems:

#### Data ####
 my_df <- data.frame(Company  = c("A","A","A","A","A","B","B","B","B","C","C","C","C","D","D","D","D"),
                  UserID     = c("Simon","Hans","Jane","Alex","David","Dan","Sarah","Susan","Bob","Keith",
                              "Harry","Adam","Kenneth","Denial","Henna","John","Dylan"),
                              First_Log_in_Date = c("2018-02-22","#","2018-03-07","2018-04-29","#","#","#",
                                                    "2018-05-01","2018-02-27","2018-06-08","2018-07-12",
                                                    "2018-02-21","#","#","#","#","#"))

#### Split - Apply - Combine ####
my_df %>% split(., .$Company) %>% lapply(function(company_df) {
    # "Check if any user logged in
    if(any(company_df$First_Log_in_Date != "#")) {
        company_df$onboarded <- T
        return(company_df)
    }
    company_df$onboarded <- F
    return(company_df)
}) %>% do.call(rbind, .)

Upvotes: 0

Related Questions