Reputation: 49
I am again here with an interesting problem.
I have a document like shown below:
"""UDAYA FILLING STATION ps\na MATTUPATTY ROAD oe\noe 4 MUNNAR Be:\nSeat 4 04865230318 Rat\nBree 4 ORIGINAL bepas e\n\noe: Han Die MC DE ER DC I se ek OO UO a Be ten\" % aot\n: ag 29-MAY-2019 14:02:23 [i\n— INVOICE NO: 292 hee fos\nae VEHICLE NO: NOT ENTERED Bea\nss NOZZLE NO : 1 ome\n- PRODUCT: PETROL ae\ne RATE : 75.01 INR/Ltr yee\n“| VOLUME: 1.33 Ltr ae\n~ 9 =6AMOUNT: 100.00 INR mae wae\nage, Ee pel Di EE I EE oe NE BE DO DC DE a De ee De ae Cate\notome S.1T. No : 27430268741C =. ver\nnes M.S.T. No: 27430268741V ae\n\nThank You! Visit Again\n""""
From the above document, I need to extract date highlighted in bold and Italics.
I tried with strpdate function but did not get the desired results.
Any help will be greatly appreciated.
Thanks in advance.
Upvotes: 0
Views: 42
Reputation: 521168
Assuming you only want to capture a single date, you may use sub
here:
text <- "UDAYA FILLING STATION ps\na MATTUPATTY ROAD oe\noe 4 MUNNAR Be:\nSeat 4 04865230318 Rat\nBree 4 ORIGINAL bepas e\n\noe: Han Die MC DE ER DC I se ek OO UO a Be ten\" % aot\n: ag 29-MAY-2019 14:02:23 [i\n— INVOICE NO: 292 hee fos\nae VEHICLE NO: NOT ENTERED Bea\nss NOZZLE NO : 1 ome\n- PRODUCT: PETROL ae\ne RATE : 75.01 INR/Ltr yee\n“| VOLUME: 1.33 Ltr ae\n~ 9 =6AMOUNT: 100.00 INR mae wae\nage, Ee pel Di EE I EE oe NE BE DO DC DE a De ee De ae Cate\notome S.1T. No : 27430268741C =. ver\nnes M.S.T. No: 27430268741V ae\n\nThank You! Visit Again\n"
date <- sub("^.*\\b(\\d{2}-[A-Z]+-\\d{4})\\b.*", "\\1", text)
date
[1] "29-MAY-2019"
If you had the need to match multiple such dates in your text, then you may use regmatches
along with regexec
:
text <- "Hello World 29-MAY-2019 Goodbye World 01-JAN-2018"
regmatches(text,regexec("\\b(\\d{2}-[A-Z]+-\\d{4})\\b", text))[[1]]
[1] "29-MAY-2019" "29-MAY-2019"
Upvotes: 1