Reputation: 101
I am trying to extract the most recent date that a report was added in an R dataframe of reports. The text always looks like Date Ordered: M/DD/YYYY
and may contain 0 many times in a given report. If it's repeating, I want the most recent (usually the last) instance, and I'm trying to convert it to a date in a mutated dplyr column.
Using the code below on my actual dataframe, I get the error:
Error in if (nchar(s) > 0 && substring(s, 1, 1) == "\002") { :
missing value where TRUE/FALSE needed
However, it runs fine on a single item making me think that it's trying to concatenate the entire column.
The test code doesn't give an error, but actually pulls the last date from the last report for all instances:
lastdate
1 1999-04-15
2 1999-04-15
dataset=data.frame(cbind(ID=c(001,002),
Report=c("Blah Blah Date Ordered: 5/19/2000 test is positive. Date Ordered: 4/2/2005 Additional testing negative.",
"Meh Date Ordered: 4/15/1999")),
stringsAsFactors = F)`
dataset %>%
mutate(lastdate = as.Date(last(gsub("Date Ordered:\\s+", "",
strapplyc(Report,
"Date Ordered:\\s*\\d+/\\d+/\\d+", simplify = TRUE))),
"%m/%d/%Y"))
Desired output should be:
2005-4-2
1999-4-15
Actual for dataset:
Error in if (nchar(s) > 0 && substring(s, 1, 1) == "\002") { :
missing value where TRUE/FALSE needed
Actual for test data:
lastdate
1 1999-04-15
2 1999-04-15
Upvotes: 0
Views: 403
Reputation: 627100
I suggest a gsub
like
dataset$lastsdate <- as.Date(gsub(".*Date Ordered:\\s*(\\d{1,2}/\\d{1,2}/\\d{4}).*|.*","\\1", dataset$Report),"%m/%d/%Y")
See the regex in action.
The regex matches:
.*
- any 0+ chars as many as possibleDate Ordered:
- a literal substring\s*
- 0+ whitespaces(\d{1,2}/\d{1,2}/\d{4})
- Capturing group 1 (\1
): 1 or 2 digits, /
, 1 or 2 digits, /
, 4 digits.*
- the rest of the string|
- or.*
- the entire string.Upvotes: 0