Reputation: 353
I have a vector of strings as shown below. I want to extract dates.
check_values <- c("deficit based on wage statement 7/14/ to 7/17/2015",
"Deficit Due: $1205.73 -$879.63= $326.10 x 70%=$228.2",
"Deficit Due for 12 wks pd - 7/14/15 thru 10/5/15;",
"Deficit due to wage statement: 4/22/15 thru 5/12/15",
"depos transcript 7/10/15 for 7/8/15 depos",
"difference owed for 4/25/15-5/22/15",
"tpd 4:30:2015 - 5:22:2015",
"Medical TREATMENT DATES: 6/30/2015 - 6/30/2015",
"4/25/15-5/22/15",
"Medical")
check_values <- as.data.table(check_values)
names(check_values) <- "check_memo"
Before extracting dates, I want to create a Date_Flag. I have used following code but it gives different output:
check_values$Date_Flag <- ifelse(grepl("([0-9]+)(/|-|:)([0-9]+)(/|-|:)([0-9]+)(/|-|:)", check_values$check_memo), 1, 0)
#[1] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE
#Expected Output: FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE FALSE
After this Date_Flag is created, I want to extract dates (both parts). Can somebody please tell me what is wrong with above regular regression?
Thanks
Upvotes: 2
Views: 1125
Reputation: 887108
We can use str_count
to create the 'Date_Flag' where the assumption is that if there are 2 full dates in each element of 'check_memo', we get TRUE or else FALSE.
library(data.table)
library(stringr)
pat <- "[0-9]{1,2}[/:][0-9]{1,2}[/:][0-9]{2,4}"
check_values[,Date_Flag := str_count(check_memo, pat)==2]
check_values
# check_memo Date_Flag
#1: deficit based on wage statement 7/14/ to 7/17/2015 FALSE
#2: Deficit Due: $1205.73 -$879.63= $326.10 x 70%=$228.2 FALSE
#3: Deficit Due for 12 wks pd - 7/14/15 thru 10/5/15; TRUE
#4: Deficit due to wage statement: 4/22/15 thru 5/12/15 TRUE
#5: depos transcript 7/10/15 for 7/8/15 depos TRUE
#6: difference owed for 4/25/15-5/22/15 TRUE
#7: tpd 4:30:2015 - 5:22:2015 TRUE
#8: Medical TREATMENT DATES: 6/30/2015 - 6/30/2015 TRUE
#9: 4/25/15-5/22/15 TRUE
#10: Medical FALSE
If we need the Dates extracted, use the same pattern with str_extract_all
check_values[(Date_Flag), paste0("Date", 1:2) :=
transpose(str_extract_all(check_memo, pat))]
check_values
check_memo #Date_Flag Date1 Date2
# 1: deficit based on wage statement 7/14/ to 7/17/2015 FALSE NA NA
# 2: Deficit Due: $1205.73 -$879.63= $326.10 x 70%=$228.2 FALSE NA NA
# 3: Deficit Due for 12 wks pd - 7/14/15 thru 10/5/15; TRUE 7/14/15 10/5/15
# 4: Deficit due to wage statement: 4/22/15 thru 5/12/15 TRUE 4/22/15 5/12/15
# 5: depos transcript 7/10/15 for 7/8/15 depos TRUE 7/10/15 7/8/15
# 6: difference owed for 4/25/15-5/22/15 TRUE 4/25/15 5/22/15
# 7: tpd 4:30:2015 - 5:22:2015 TRUE 4:30:2015 5:22:2015
# 8: Medical TREATMENT DATES: 6/30/2015 - 6/30/2015 TRUE 6/30/2015 6/30/2015
# 9: 4/25/15-5/22/15 TRUE 4/25/15 5/22/15
#10: Medical FALSE NA NA
Upvotes: 3