Reputation: 1
I have a table with parent-child relationship. For a dashboard I am creating, I need to create a table so the dashboard (SAS Viya) can work with the table. For that I need to get for every company id (cid) all parents and children, including the level within the structure (e.g. if it is the 3-rd level child) and a sort_chain which is a chain of numbers from the parent to that child, so be able to group/sort all sort_cid's in the dashboard.
(The name sort_ is because I will later use these columns in a dashboard to display/sort the data properly.)
In short, I'm trying to go from this table:
cid | parent_cid |
---|---|
a | |
b | a |
c | b |
d | a |
e | |
f | |
g | f |
To this table using PROC SQL in SAS:
EDIT: I have updated sort_chain below to reflect what I believe most people feel as more 'logical'.
cid | sort_cid | sort_chain | sort_level |
---|---|---|---|
a | a | 1.0.0 | 1 |
a | b | 1.1.0 | 2 |
a | c | 1.1.1 | 3 |
a | d | 1.2.0 | 2 |
b | a | 1.0.0 | 1 |
b | b | 1.1.0 | 2 |
b | c | 1.1.1 | 3 |
c | a | 1.0.0 | 1 |
c | b | 1.1.0 | 2 |
c | c | 1.1.1 | 3 |
d | a | 1.0.0 | 1 |
d | d | 1.2.0 | 2 |
e | e | 2.0.0 | 1 |
f | f | 3.0.0 | 1 |
f | g | 3.1.0 | 2 |
g | f | 3.0.0 | 1 |
g | g | 3.1.0 | 2 |
How can I achieve this?
What I have tried:
First step was setting all missing parent_cid to itself (lets call this table1):
cid | parent_cid |
---|---|
a | a |
b | a |
c | b |
d | a |
e | e |
f | f |
g | f |
Then I tried self-joining this table to get all known daughters, but then you get an extra column, instead of extra rows:
SELECT
a.cid, a.parent_cid, b.cid AS sort_chain
FROM
table1 a
INNER JOIN
table1 b
ON
b.parent_company_id = a.company_id;
Same for self-join to get all known parents.
I also tried a cross-join, but then the table gets everything x everything, which doesn't make sense.
In short: I feel like I'm thinking in the right direction, but missing something or messing something up. Hope someone can provide me an example code or point me in the right direction. Thanks!
Upvotes: 0
Views: 60
Reputation: 51566
Use a HASH() to find the depth/level and top node for each observation.
data want;
* Load the cid->parent_cid mapping into a hash ;
if _n_=1 then do;
declare hash h(dataset:'have');
h.definekey('cid');
h.definedata('parent_cid');
h.definedone();
end;
* Load next CID value ;
set have ;
* Find their top ancestor and how deep they are ;
do depth=1 by 1 until(h.find());
top=cid;
cid=parent_cid;
end;
* Re-load the same CID value to reset CID and PARENT_CID ;
set have;
run;
To generate your counters use BY group processing. Make sure to make enough counters for the maximum depth your data has.
proc sort;
by top depth cid ;
run;
* find how many levels will be needed ;
proc sql noprint;
select max(depth) into :md trimmed from want;
quit;
* Use FIRST. processing to get the counters ;
* Concatenate to make index string ;
data numbered;
set want;
by top depth cid;
array c[&md] ;
c[depth]+1;
if first.depth then do j=depth+1 to &md;
c[j]=1;
end;
drop j ;
sort_chain=catx('.',of c[*]);
run;
Results
And if we add a new observation CID='h' and PARENT_CID='c' to your example data we will automatically get 4 level sort chain values:
Upvotes: 1