Reputation: 1031
jrnlfile
is a dataset with journal names and identifiers. Here are the first 6 obs:
id journal issn
56201 ACTA HAEMATOLOGICA 0001-5792
94365 ACTA PHARMACOLOGICA SINICA
10334 ACTA PHARMACOLOGICA SINICA 1671-4083
55123 ADVANCES IN ENZYME REGULATION 0065-2571
90002 AGING
10403 AGING 1945-4589
Compare id
94365 and 10334. These obs name the same journal
. They need the same issn
. An obs with a missing value for issn
almost always has at least one partner obs that contains a matching journal
name and the correct issn
. Wherever this is true, I want to recode the missing issn
so it contains the issn
seen in other instances where the same journal
is mentioned. A revised dataset want
would look like this:
id journal issn
56201 ACTA HAEMATOLOGICA 0001-5792
94365 ACTA PHARMACOLOGICA SINICA 1671-4083
10334 ACTA PHARMACOLOGICA SINICA 1671-4083
55123 ADVANCES IN ENZYME REGULATION 0065-2571
90002 AGING 1945-4589
10403 AGING 1945-4589
I currently use if-else statements in a data step to populate missing issn
values with matching entries for journal
:
data want;
set jrnlfile;
if journal = "ACTA PHARMACOLOGICA SINICA" then issn = "1671-4083";
else if journal = "AGING" then issn = "1945-4589";
/*continue for 7,000 other journals*/
run;
But jrnlfile
contains 50,000 obs and 7,000 unique journals, so this takes a lot of time and is rather error-prone. This answer gets me halfway there, but issn
is not numeric and I can't solve the problem by simply adding values to it.
What is a more efficient and systematic way to get to want
from jrnlfile
?
Upvotes: 2
Views: 377
Reputation: 51591
If the data is sorted by JOURNAL and the valid value appears first then a simple UPDATE might work. But watch out if there are other variables with missing values.
data want;
update have(obs=0) have ;
by journal;
output;
run;
You might try merging the data with the non-missing values of ISSN. That only requires that the data is sorted by JOURNAL. That will work very well if only one unique non-missing value is present. If there are multiple non-missing values then the results are not so nice.
data want ;
merge have have(where=(not missing(issn)) keep=journal issn rename=(issn=_2));
by journal;
if missing(issn) then issn=_2;
drop _2;
run;
Upvotes: 1
Reputation: 726
You can use retain statment. But there is restraints for this code. To empty journal will be set the first found issn. And there are must be a one or more issn for journal group.
proc sort data=JRNLFILE;
by journal descending issn;
run;
data want;
set JRNLFILE;
retain t_issn;
by journal descending issn;
if first.journal then
do;
if issn="" then do;
put "ERROR: there is no issn val for group";
stop;
end;
else t_issn =issn;
end;
if issn="" then
do;
issn=t_issn;
end;
run;
For example. If you use this table:
+-------+------------------------------+-----------+
| id | journal | issn |
+-------+------------------------------+-----------+
| 94365 | ACTA PHARMACOLOGICA SINICA | |
| 10334 | ACTA PHARMACOLOGICA SINICA | 1671-4083 |
| 1 | ACTA PHARMACOLOGICA SINICA | A_TEST |
| 2 | ACTA PHARMACOLOGICA SINICA | WAS |
| 3 | ACTA PHARMACOLOGICA SINICA | SATRTED |
+-------+------------------------------+-----------+
You will get:
+-------+----------------------------+-----------+--------+
| id | journal | issn | t_issn |
+-------+----------------------------+-----------+--------+
| 2 | ACTA PHARMACOLOGICA SINICA | WAS | WAS |
| 3 | ACTA PHARMACOLOGICA SINICA | SATRTED | WAS |
| 1 | ACTA PHARMACOLOGICA SINICA | A_TEST | WAS |
| 10334 | ACTA PHARMACOLOGICA SINICA | 1671-4083 | WAS |
| 94365 | ACTA PHARMACOLOGICA SINICA | WAS | WAS |
+-------+----------------------------+-----------+--------+
Error example. If you use this table:
+-------+------------------------------+-----------+
| id | journal | issn |
+-------+------------------------------+-----------+
| 56201 | ACTA HAEMATOLOGICA | 0001-5792 |
| 94365 | ACTA PHARMACOLOGICA SINICA | |
+-------+------------------------------+-----------+
You will get an ERROR:
ERROR: there is no issn val for group
*t_issn leaved to understand function :))
Upvotes: 1