Reputation: 11
I have a dataframe like below on users campaign with below details:
Columns are: email_address, response_date, campaign_name, state, suburb, postcode, magazine_subs_title, response_type
email_address response_date campaign_name state suburb postcode magazine_subs_title response_type
[email protected] 1/02/2018 18:01 2018_Beauty_Acq NSW Sydney 2000 AGL opened
[email protected] 03/10/2017 14:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI delivered
[email protected] 03/10/2017 17:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI opened
[email protected] 25/01/2018 9:00 2018_Beauty_Acq NSW Sydney 2000 AGL delivered
[email protected] 14/07/2017 11:00 2017_Fashion_Show NSW Sydney 2000 AGL delivered
From here, I want to extract the response_date where response_type = 'delivered' and specific for each campaigns and end up with below table:
email_address response_date campaign_name state suburb postcode magazine_subs_title response_type delivered_date
[email protected] 1/02/2018 18:01 2018_Beauty_Acq NSW Sydney 2000 AGL opened 25/01/2018 9:00
[email protected] 03/10/2017 14:00:00 PM 2017_Fashion_Show QLD Brisbane 4000 MHI delivered 03/10/2017 14:00:00 PM
[email protected] 03/10/2017 17:00:00 PM 2017_Fashion_Show QLD Brisbane 4000 MHI opened 03/10/2017 14:00:00 PM
[email protected] 25/01/2018 9:00 2018_Beauty_Acq NSW Sydney 2000 AGL delivered 25/01/2018 9:00
[email protected] 14/07/2017 11:00 2017_Fashion_Show NSW Sydney 2000 AGL delivered 14/07/2017 11:00
Does that makes sense?
Any one know how to perform that sort of operation in R? Thank you
Upvotes: 0
Views: 250
Reputation: 20095
One approach could be using lubridate
, tidyr
and dplyr
.
The approach is to first prepare data. Read response_date
and Time
separately and then unite
those to column to make response_date
. Then use parse_date_time
to convert those two columns in to datetime format which is optional(as OP is not taking any decision on this date). Finally apply ifelse
to populate delivered_date
.
#Data
df <- read.table(text = "
email_address response_date Time campaign_name state suburb postcode magazine_subs_title response_type
[email protected] 1/02/2018 18:01 2018_Beauty_Acq NSW Sydney 2000 AGL opened
[email protected] 03/10/2017 14:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI delivered
[email protected] 03/10/2017 17:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI opened
[email protected] 25/01/2018 9:00 2018_Beauty_Acq NSW Sydney 2000 AGL delivered
[email protected] 14/07/2017 11:00 2017_Fashion_Show NSW Sydney 2000 AGL delivered", header=T, stringsAsFactor = F)
library(lubridate)
library(dplyr)
library(tidyr)
df %>%
unite("response_date", c("response_date", "Time"), sep= " ") %>%
mutate(response_date = parse_date_time(response_date, c("dmy HMS", "dmy HM"))) %>%
mutate(delivered_date = ifelse(grepl("delivered",response_type), as.character(response_date), NA)) %>%
group_by(campaign_name, state, suburb, postcode) %>%
fill(delivered_date) %>% ungroup() %>%
as.data.frame()
Result:
email_address response_date campaign_name state suburb postcode magazine_subs_title response_type delivered_date
#1 [email protected] 2017-07-14 11:00:00 2017_Fashion_Show NSW Sydney 2000 AGL delivered 2017-07-14 11:00:00
#2 [email protected] 2017-10-03 14:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI delivered 2017-10-03 14:00:00
#3 [email protected] 2017-10-03 17:00:00 2017_Fashion_Show QLD Brisbane 4000 MHI opened 2017-10-03 14:00:00
#4 [email protected] 2018-02-01 18:01:00 2018_Beauty_Acq NSW Sydney 2000 AGL opened <NA>
#5 [email protected] 2018-01-25 09:00:00 2018_Beauty_Acq NSW Sydney 2000 AGL delivered 2018-01-25 09:00:00
Upvotes: 1