Reputation: 593
I have a maximum of 200 characters length of text for thousands of records which has date and time in it. I am trying to parse out the time
Here is the example of the text
that I have
Your account your account your account on Jan 10, 2020 at 8.30 AM ET your account
Your account your account your account on Jan 3, 2020 6.30PM ET your account
Your account your account your account on Jan 11, 2020 at 6.30PM ET your account
Desired output
8.30 AM
6.30PM
6.30PM
In all the content ET
is common and I am using index
and substr
function to parse out the time.
Time=substr(Text,index(Text,' on ')+19,6);
For the second line, I am also getting extra characters since there is no at
and no space between time and PM
Is there any efficient way to parse the time?
Thanks
Upvotes: 1
Views: 93
Reputation: 27508
SAS can locate a text excerpt using a Perl regular expression that has a capture buffer.
data want(keep=parsed_timestring);
length parsed_timestring $8;
input;
/* Pattern:
* On a word boundary \b
* Capture start (
* 1 or 2 digits \d{1,2}
* A period \.
* 1 or 2 digits \d{1,2}
* 0 or 1 spaces \s?
* letter A or P (A|P)
* letter M M
* Capture end )
*/
prx = prxparse('/\b(\d{1,2}\.\d{1,2}\s?(A|P)M)/x');
if prxmatch (prx, _infile_) then
parsed_timestring = prxposn(prx,1,_infile_);
datalines;
Your account your account your account on Jan 10, 2020 at 8.30 AM ET your account
Your account your account your account on Jan 3, 2020 6.30PM ET your account
Your account your account your account on Jan 11, 2020 at 6.30PM ET your account
Your account your account your account on Jan 11, 2020 at 6666.30PM ET your account
;
proc print;
run;
In the last row parsed_timestring
is blank because 6666.30PM
starts with more than two digits on a word boundary and thus would not match the pattern and thus the line would not have a proper time string.
Upvotes: 1