dingaro
dingaro

Reputation: 2342

How to extract some value from character column in PROC SQL in SAS Enterprise Guide?

I have table in SAS like below:

data type:

And I need to create new column "COL1" base on column "TYPE" where I need to extract value which is after character "Top":

So I need something like below:

ID  | TYPE                              | COL1
--- |-----------------------------------|------
111 | {"Top":10}                        | 10
222 | {"abC":"false","Top":111, "aa":5} | 111
333 | {"Top":15.2}                      | 15.2
... | ...                               | ...

How can I do that in PROC SQL in SAS Enterprise Guide ?

Upvotes: 0

Views: 704

Answers (3)

Tom
Tom

Reputation: 51566

Looks like your TYPE variable has JSON strings. If the strings are as simple as your example you can just use nested SCAN() function calls to tease out the values.

So first let's convert your data listing into actual data.

data have;
  input ID TYPE $50.;
cards4;
111 {"Top":10}
222 {"abC":"false","Top":111, "aa":5}
333 {"Top":15.2}
;;;;

Now let's process the JSON text to pull out the name:value pairs. And when the name is Top let's convert the value into a number.

data want;
  set have;
  do index=1 to countw(type,'{,}','q');
     pair = scan(type,index,'{,}','q');
     name = dequote(scan(pair,1,':','q'));
     value = dequote(scan(pair,2,':','q'));
     if name='Top' then number=input(value,32.);
  end;
  drop index pair name value;
run;

Note that if Top appears multiple times in your JSON text the last value seen will be used.

Results:

OBS     ID    TYPE                                 number

 1     111    {"Top":10}                             10.0
 2     222    {"abC":"false","Top":111, "aa":5}     111.0
 3     333    {"Top":15.2}                           15.2

Upvotes: 0

Stu Sztukowski
Stu Sztukowski

Reputation: 12849

A regex-based solution:

data want;
    set have;

    /* Create a regular expression ID */
    regexid = prxparse('/(?<="Top":)(\d+)/');

    /* Find the position and length of the match */
    call prxsubstr(regexid, type, pos, len);

    /* Extract the match and convert it to a number */
    COL1 = input(substr(type, pos, len), 8.);
run;

You can test this out here: https://regex101.com/r/0RDsCJ/1

Upvotes: 2

Kermit
Kermit

Reputation: 3117

Perhaps use the following

data want;
   set have;
   length col1 $50.;
   col1 = strip(scan(tranwrd(type,'"Top":','|'),2,'|},'));
run;
  • Use the tranwrd() function to replace all occurrences of "Top": in type by a vertical bar |.
  • Use the scan() function to extract the substring between the vertical bar | and the closing bracket } or comma ,.
 id                    type                    col1
 111     {"Top":10}                              10
 222     {"abC":"false","Top":111,"aa":5}       111
 333     {"Top":15.2}                          15.2

Upvotes: 2

Related Questions