Fred Ng
Fred Ng

Reputation: 143

Extract date and time from text using SAS

I have something like this, which is in .txt format.

'random title'

random things , 00:00 AM, 1 January

2005, 555 words, (English)

'random long title'

random things , 00:00 AM, 1 January 2005, 111 words,

(English)

The time and date need to be extracted in the format yyyymmdd and hhmm. I tried to use comma as the delimiter.

DATA News;
  INFILE 'C:xxxx/xxxx/xxxx' DLM',';
  INPUT Title $75. Time $10. Date $20. Words $15. Lang $10.;
PROC PRINT DATA=News;
  TITLE 'Time and Date';
  VAR Time Date;
RUN;

But it failed, those entries contain multiple lines and also are not well-formatted.

Are there any solutions?

Upvotes: 2

Views: 598

Answers (1)

Joe
Joe

Reputation: 63424

If your dates are always formatted like so: 00:00 AM, 1 January 2005

Then you can use a perl regular expression to find them.

data test;
input @;
_prx = prxparse('/\d\d:\d\d (?:AM|PM), \d{1,2} (?:January|February|March) \d{4}/');
start = 1;
stop = length(_infile_);
call prxnext(_prx, start, stop, _infile_, position, length);
   do while (position > 0);
      found = substr(_infile_, position, length);
      put found= position= length=;
      call prxnext(_prx, start, stop, _infile_, position, length);
   end;
datalines;
'random title'
random things , 00:00 AM, 1 January
2005, 555 words, (English)
'random long title'
random things , 00:00 AM, 1 January 2005, 111 words,
(English)
;;;;
run;

Then use the FOUND value as you would normally with a SAS character variable to obtain date and time, or datetime, information. Obviously extend my short list of months to contain all twelve months.

That finds the second example, but not the first (which is not reasonably findable using datalines in an example); but if you are not using datalines, but instead a text file, you could manipulate the record format to remove the line feed and carriage return and thus see both as a single record (and thus match). Look into RECFM=N for more details on that.

Upvotes: 1

Related Questions