Reputation: 2342
I have table in SAS like below:
data type:
ID - numeric
TYPE - character
ID | TYPE |
---|---|
111 | {"Top":10} |
222 | {"abC":"false","Top":111, "aa":5} |
333 | {"Top":15.2} |
... | ... |
And I need to create new column "COL1" base on column "TYPE" where I need to extract value which is after character "Top":
So I need something like below:
ID | TYPE | COL1
--- |-----------------------------------|------
111 | {"Top":10} | 10
222 | {"abC":"false","Top":111, "aa":5} | 111
333 | {"Top":15.2} | 15.2
... | ... | ...
How can I do that in PROC SQL in SAS Enterprise Guide ?
Upvotes: 0
Views: 704
Reputation: 51566
Looks like your TYPE variable has JSON strings. If the strings are as simple as your example you can just use nested SCAN() function calls to tease out the values.
So first let's convert your data listing into actual data.
data have;
input ID TYPE $50.;
cards4;
111 {"Top":10}
222 {"abC":"false","Top":111, "aa":5}
333 {"Top":15.2}
;;;;
Now let's process the JSON text to pull out the name:value pairs. And when the name is Top
let's convert the value into a number.
data want;
set have;
do index=1 to countw(type,'{,}','q');
pair = scan(type,index,'{,}','q');
name = dequote(scan(pair,1,':','q'));
value = dequote(scan(pair,2,':','q'));
if name='Top' then number=input(value,32.);
end;
drop index pair name value;
run;
Note that if Top appears multiple times in your JSON text the last value seen will be used.
Results:
OBS ID TYPE number
1 111 {"Top":10} 10.0
2 222 {"abC":"false","Top":111, "aa":5} 111.0
3 333 {"Top":15.2} 15.2
Upvotes: 0
Reputation: 12849
A regex-based solution:
data want;
set have;
/* Create a regular expression ID */
regexid = prxparse('/(?<="Top":)(\d+)/');
/* Find the position and length of the match */
call prxsubstr(regexid, type, pos, len);
/* Extract the match and convert it to a number */
COL1 = input(substr(type, pos, len), 8.);
run;
You can test this out here: https://regex101.com/r/0RDsCJ/1
Upvotes: 2
Reputation: 3117
Perhaps use the following
data want;
set have;
length col1 $50.;
col1 = strip(scan(tranwrd(type,'"Top":','|'),2,'|},'));
run;
tranwrd()
function to replace all occurrences of "Top":
in type
by a vertical bar |
.scan()
function to extract the substring between the vertical bar |
and the closing bracket }
or comma ,
. id type col1
111 {"Top":10} 10
222 {"abC":"false","Top":111,"aa":5} 111
333 {"Top":15.2} 15.2
Upvotes: 2