99Valk
99Valk

Reputation: 243

Using PRXPARSE and PRXSUBSTR to extract pieces of a string

Using SAS 9.3 I want to extract the portion of each string between the "." (dot) and '"' (double quote) that follows the dot. For example the result of line one below should be f2015_cnt_cont_line

      <characteristic abc="[2015].f2015_cnt_cont_line" xxxxxxxx="8129" />
      <characteristic abc="[2015].f2015_dbt_cont_line" xxxxxxxx="8134" />
      <characteristic abc="[2015].f2015_ctl_tot_acct_bal" xxxxxxxx="8133" />
      <characteristic abc="[2015].f2015_cnt_comb_line" xxxxxxxx="8118" />
      <characteristic abc="[2015].f2015_dbt_comb_line" xxxxxxxx="8138" />

Does anyone have an example I could use?

Thanks Dan

Upvotes: 3

Views: 11694

Answers (4)

Moacyr Pinheiro
Moacyr Pinheiro

Reputation: 1

That works for me

SUBSTR(
t1.field, 
index(t1.field,'.')+1,
(index(t1.field,'"')-index(t1.field,'.')-1)
)

Upvotes: 0

itzy
itzy

Reputation: 11765

A regular expression that matches your pattern is \.(.*?)\". This means: find a dot (a special character; then any character (the ? makes it not "greedy", so it captures as few characters as possible); and then a quotation mark.

Using this example from the SAS documentation, something like this should work:

data test;
   set _your_data_set;
   retain re;
   if _N_ = 1 then re = prxparse('/\.(.*?)\"/');
   if prxmatch(re, var) then result = prxposn(re, 1, var);
run;

(This assumes your data is in a variable called var.)

Upvotes: 4

DavB
DavB

Reputation: 1696

Instead of using the PRX functions, something like the following might suffice:

text=scan(scan(line,2,"."),1,'"');

This assumes the text is stored in a variable called line.

Upvotes: 3

BellevueBob
BellevueBob

Reputation: 9618

Here's one way:

inner = SCAN(SUBSTR(line,INDEX(line,'.')+1),1,'"');

The inner SUBSTR function skips to the column after the first dot; the outer SCAN function returns the first word delimited by a double quote.

Upvotes: 2

Related Questions