Reputation: 59
I have a SAS dataset, where each row contains information about firms. There is a variable about the sector of the company; sect. There are up to 700 sectors, so I need to group them, like:
Industry: mining, textiles industry, food industry... and so on. I do this with the following code:
data temp; set in.data;
if sect in ('01' '02' '03' '04' '05' '06') then my_sector='Industry' ;
run;
The tricky part is ; I also want to give out details within each sector. For instance, I want to keep this whole "Industry" sector, but I also want to have subgroups within "Industry". But when I run the following code:
data temp; set in.data;
if sect in ('01' '02' '03' '04' '05' '06') then my_sector='Industry' ;
if sect in ('01' '02') then my_sector='Textile Industry' ;
if sect in ('03' '04') then my_sector='Food Industry' ;
if sect in ('05') then my_sector='Mining Industry' ;
run;
The subgroups work just fine, but the global "Industry" sector only contains what is not contained in other subgroups (here the sector 06). So mu question is; how do I have the 3 subgroups but also an "Industry" sector that has all the 3 subgroups and the rest of the industry (from 01 to 06)?
Thank you very much,
Upvotes: 1
Views: 85
Reputation: 27498
For the case of
I also want to have subgroups
you will want to map 'sect' into a two variable hierarchy, such as parent, 'industry', and the child, 'sector'.
In general the control of the mapping is best set off into control tables that will let you easily handle to industry and sector names as new or additional 'sect' values come into play.
data sect_mappings; length sect $2 group $8 sector $8; input
sect group sector; datalines;
01 Industry Textile
02 Industry Textile
03 Industry Food
04 Industry Food
05 Industry Mining
run;
The control table can be left joined to the original data so as to map the 'sect' value to the 'group' and 'sector' names (i.e. formatted values of sect
).
One of the strongest features in SAS is the concept of custom formatting, so you can actually leave the sect
value alone, and process it automatically according to the formatted value.
A custom format can be created directly from the mapping table. In your case you want to two custom formats, one for mapping sect
to group
and the other for sect
to sector
.
data sector_cntlin;
set sect_mappings;
rename sect=start;
fmtname = '$sect_group'; label=group; output;
fmtname = '$sect_name' ; label=sector; output;
run;
proc sort data=sector_cntlin;
by fmtname;
run;
proc format cntlin=sector_cntlin;
run;
The formats can be used to SAS procedures that use sect
as a by
or class
variable when aggregating information.
Suppose you have sample data:
data have;
call streaminit(123);
do company_id = 1 to 100;
sect = put(ceil(rand('uniform', 10)), z2.);
output;
end;
run;
In order to have a two-tier hierarchy (using formatted values), you will need an additional variable that has the sect
value repeated.
data want;
set have;
sect_repeat = sect;
run;
At this point a SAS procedure process the data in a two-level hierarchy if the sect
and sect_repeat
variables are the class
variables.
proc tabulate data=want;
class sect sect_repeat;
format sect $sect_group.;
format sect_repeat $sect_name.;
label sect = 'Group';
label sect_repeat = 'Sector';
table
sect * (all sect_repeat)
, n
/ nocellmerge
;
run;
Outputs a report such as the following. The unmapped values of sect
are obvious and can be added to the mapping table, or filtered out with a where
statement.
Upvotes: 1
Reputation: 1770
After execution first if-statement
you overwrite value of variable. You should use other variable or output
statement:
data want; set have;
length my_sector $20;
if sect in ('01' '02' '03' '04' '05' '06') then do;
my_sector='Industry' ;
output;
if sect in ('01' '02') then do;
my_sector='Textile Industry' ;
output;
end;
if sect in ('03' '04') then do;
my_sector='Food Industry' ;
output;
end;
if sect in ('05') then do;
my_sector='Mining Industry' ;
output;
end;
end;
run;
have dataset:
+------+
| sect |
+------+
| 01 |
| 02 |
| 03 |
| 04 |
| 05 |
| 06 |
+------+
want dataset:
+------+------------------+
| sect | my_sector |
+------+------------------+
| 01 | Industry |
| 01 | Textile Industry |
| 02 | Industry |
| 02 | Textile Industry |
| 03 | Industry |
| 03 | Food Industry |
| 04 | Industry |
| 04 | Food Industry |
| 05 | Industry |
| 05 | Mining Industry |
| 06 | Industry |
+------+------------------+
Upvotes: 0