Reputation: 507
data test;
extract_string = "<some string here>";
my_result1 = prxchange(cat("s/^.*", extract_string, ".*$/$1/"), -1, "A1M_PRE");
my_result2 = prxchange(cat("s/^.*", extract_string, ".*$/$1/"), -1, "AC2_0M");
my_result3 = prxchange(cat("s/^.*", extract_string, ".*$/$1/"), -1, "GA3_30M");
my_result4 = prxchange(cat("s/^.*", extract_string, ".*$/$1/"), -1, "DE3_1H30M");
run;
Extract the number after _
but preceding M
in strings that have M
at the end. The result set should be:
my_result1 = ""
my_result2 = "0"
my_result3 = "30"
my_result4 = "30"
extract_string
values fail"\.*(\d*)M\b\"
"\.*(\d*?)M\b\"
"\.*(\d{*})M\b\"
"\.*(\d{*?})M\b\"
"\.*(\d){*}M\b\"
"\.*(\d){*?}M\b\"
"\.*(\d+)M\b\"
"\.*(\d+?)M\b\"
"\.*(\d{+})M\b\"
"\.*(\d{+?})M\b\"
"\.*(\d){+}M\b\"
"\.*(\d){+?}M\b\"
"\.*(\d+\d+)M\b\"
extract_string
yet. Ideas?cat("s/&.*", extract_string, ".*$/$1/")
needs to be modified. Ideas?prxpson(prxmatch(prxparse()))
instead of prxchange
. How would that be formulated?https://support.sas.com/rnd/base/datastep/perl_regexp/regexp-tip-sheet.pdf
https://www.pharmasug.org/proceedings/2013/CC/PharmaSUG-2013-CC35.pdf
SAS PRX to extract substring please
extracting substring using regex in sas
Extract substring from a string in SAS
The suffix in the cat
function and the extract_string
were modified.
data test;
extract_string = "?(?:_[^_r\n]*?(\d+)M)?$";
my_result1 = prxchange(cat("s/^.*", extract_string, "/$1/"), -1, "A1M_PRE");
my_result2 = prxchange(cat("s/^.*", extract_string, "/$1/"), -1, "AC2_0M");
my_result3 = prxchange(cat("s/^.*", extract_string, "/$1/"), -1, "GA3_30M");
my_result4 = prxchange(cat("s/^.*", extract_string, "/$1/"), -1, "DE3_1H30M");
run;
This solution uses the other prx
-family functions: prxparse
, prxmatch
, and prxposn
.
data have;
length string $10;
input string;
datalines;
A1M_PRE
AC2_0M
GA3_30M
DE3_1H30M
;
data want;
set have;
rxid = prxparse ('/_.*?(\d+)M\s*$/');
length digit_string $8;
if prxmatch (rxid, string) then digit_string = prxposn(rxid,1,string);
number_extracted = input (digit_string, ? 12.);
run;
Upvotes: 2
Views: 3037
Reputation: 27516
Use PRXPOSN
to extract a match group.
Example:
Use pattern /_.*?(\d+)M\s*$/
to locate the last run of digits before a terminating M
character.
Regex:
_
literal underscore.*?
non-greedy any characters(\d+)
capture one or more digitsM
literal M\s*$
- any number of trailing spaces, needed due to SAS character values being right padded with spaces to variable attribute lengthdata have;
length string $10;
input string;
datalines;
A1M_PRE
AC2_0M
GA3_30M
DE3_1H30M
;
data want;
set have;
rxid = prxparse ('/_.*?(\d+)M\s*$/');
length digit_string $8;
if prxmatch (rxid, string) then digit_string = prxposn(rxid,1,string);
number_extracted = input (digit_string, ? 12.);
run;
Result
Upvotes: 1
Reputation: 163577
If you want remove from the line and keep the digits preceding M at the end of the line, you could use a capturing group. In the replacement keep the value of group 1 $1
^.*?(?:_[^_r\n]*?(\d+)M)?$
Explanation
^
Start of string.*?
Match any char as least as possible(?:
Non capture group
_[^_r\n]*?
Match _
and any char except an underscore(\d+)M
Capture group 1, match 1+ digits followed by M
)?
Close group and make it optional$
End of stringYou could make the extract_string the full pattern:
extract_string = "^.*?(?:_[^_r\n]*?(\d+)M)?$";
my_result1 = prxchange(cat("s/", extract_string, "/$1/"), -1, "A1M_PRE");
Or if you must keep the leading ^.*
use
extract_string = "?(?:_[^_r\n]*?(\d+)M)?$";
Upvotes: 2
Reputation: 110745
I understand that SAS can use Perl's regex engine. The latter supports \K
, which directs the engine to discard everything matched so far and reset the starting point of the match to the current location. The following regular expression should therefore match the substring's digits that are of interest.
_.*?\K\d+(?=M$)
A failure to match would be interpreted as an empty string having been matched.
Upvotes: 3