ckp
ckp

Reputation: 593

Parsing Time from text using SAS

I have a maximum of 200 characters length of text for thousands of records which has date and time in it. I am trying to parse out the time

Here is the example of the text that I have

 Your account your account your account on  Jan 10, 2020 at 8.30 AM ET your account
 Your account your account your account on  Jan 3, 2020 6.30PM ET your account
 Your account your account your account on  Jan 11, 2020 at 6.30PM ET your account

Desired output

   8.30 AM
   6.30PM
   6.30PM  

In all the content ET is common and I am using index and substr function to parse out the time.

   Time=substr(Text,index(Text,' on ')+19,6);

For the second line, I am also getting extra characters since there is no at and no space between time and PM

Is there any efficient way to parse the time?

Thanks

Upvotes: 1

Views: 93

Answers (1)

Richard
Richard

Reputation: 27508

SAS can locate a text excerpt using a Perl regular expression that has a capture buffer.

data want(keep=parsed_timestring);
length parsed_timestring $8;
input;

  /* Pattern:
   *   On a word boundary     \b
   *   Capture start          (
   *     1 or 2 digits        \d{1,2}
   *     A period             \.
   *     1 or 2 digits        \d{1,2}
   *     0 or 1 spaces        \s?
   *     letter A or P        (A|P)
   *     letter M             M
   *   Capture end            )
   */

  prx = prxparse('/\b(\d{1,2}\.\d{1,2}\s?(A|P)M)/x');

  if prxmatch (prx, _infile_) then 
    parsed_timestring = prxposn(prx,1,_infile_);

datalines;
Your account your account your account on  Jan 10, 2020 at 8.30 AM ET your account
 Your account your account your account on  Jan 3, 2020 6.30PM ET your account
 Your account your account your account on  Jan 11, 2020 at 6.30PM ET your account
 Your account your account your account on  Jan 11, 2020 at 6666.30PM ET your account
;

proc print;
run;

enter image description here

In the last row parsed_timestring is blank because 6666.30PM starts with more than two digits on a word boundary and thus would not match the pattern and thus the line would not have a proper time string.

Upvotes: 1

Related Questions