J.Q
J.Q

Reputation: 1031

recode values in a character variable based on another character variable's value in sas

jrnlfile is a dataset with journal names and identifiers. Here are the first 6 obs:

id      journal                         issn
56201   ACTA HAEMATOLOGICA              0001-5792
94365   ACTA PHARMACOLOGICA SINICA  
10334   ACTA PHARMACOLOGICA SINICA      1671-4083
55123   ADVANCES IN ENZYME REGULATION   0065-2571
90002   AGING   
10403   AGING                           1945-4589

Compare id 94365 and 10334. These obs name the same journal. They need the same issn. An obs with a missing value for issn almost always has at least one partner obs that contains a matching journal name and the correct issn. Wherever this is true, I want to recode the missing issn so it contains the issn seen in other instances where the same journal is mentioned. A revised dataset want would look like this:

id      journal                         issn
56201   ACTA HAEMATOLOGICA              0001-5792
94365   ACTA PHARMACOLOGICA SINICA      1671-4083
10334   ACTA PHARMACOLOGICA SINICA      1671-4083
55123   ADVANCES IN ENZYME REGULATION   0065-2571
90002   AGING                           1945-4589
10403   AGING                           1945-4589

I currently use if-else statements in a data step to populate missing issn values with matching entries for journal:

data want;
    set jrnlfile;
         if journal = "ACTA PHARMACOLOGICA SINICA" then issn = "1671-4083";
    else if journal = "AGING"                      then issn = "1945-4589";
    /*continue for 7,000 other journals*/
    run;

But jrnlfile contains 50,000 obs and 7,000 unique journals, so this takes a lot of time and is rather error-prone. This answer gets me halfway there, but issn is not numeric and I can't solve the problem by simply adding values to it.

What is a more efficient and systematic way to get to want from jrnlfile?

Upvotes: 2

Views: 377

Answers (2)

Tom
Tom

Reputation: 51591

If the data is sorted by JOURNAL and the valid value appears first then a simple UPDATE might work. But watch out if there are other variables with missing values.

data want;
  update have(obs=0) have ;
  by journal;
  output;
run;

You might try merging the data with the non-missing values of ISSN. That only requires that the data is sorted by JOURNAL. That will work very well if only one unique non-missing value is present. If there are multiple non-missing values then the results are not so nice.

data want ;
   merge have have(where=(not missing(issn)) keep=journal issn rename=(issn=_2));
   by journal;
   if missing(issn) then issn=_2;
   drop _2;
run;

Upvotes: 1

Sanek Zhitnik
Sanek Zhitnik

Reputation: 726

You can use retain statment. But there is restraints for this code. To empty journal will be set the first found issn. And there are must be a one or more issn for journal group.

proc sort data=JRNLFILE;
    by journal descending issn;
run;

data want;
    set  JRNLFILE;
    retain t_issn;
    by journal descending issn;

    if first.journal then
        do;
            if issn="" then do;
                put "ERROR: there is no issn val for group";
                stop;
            end;
            else t_issn =issn;
        end;

    if issn="" then
        do;
            issn=t_issn;
        end;
run;

For example. If you use this table:

+-------+------------------------------+-----------+
|  id   |           journal            |    issn   |
+-------+------------------------------+-----------+
| 94365 | ACTA PHARMACOLOGICA SINICA   |           |
| 10334 | ACTA PHARMACOLOGICA SINICA   | 1671-4083 |
|     1 | ACTA PHARMACOLOGICA SINICA   | A_TEST    |
|     2 | ACTA PHARMACOLOGICA SINICA   | WAS       |
|     3 | ACTA PHARMACOLOGICA SINICA   | SATRTED   |
+-------+------------------------------+-----------+

You will get:

+-------+----------------------------+-----------+--------+
|  id   |          journal           |   issn    | t_issn |
+-------+----------------------------+-----------+--------+
|     2 | ACTA PHARMACOLOGICA SINICA | WAS       | WAS    |
|     3 | ACTA PHARMACOLOGICA SINICA | SATRTED   | WAS    |
|     1 | ACTA PHARMACOLOGICA SINICA | A_TEST    | WAS    |
| 10334 | ACTA PHARMACOLOGICA SINICA | 1671-4083 | WAS    |
| 94365 | ACTA PHARMACOLOGICA SINICA | WAS       | WAS    |
+-------+----------------------------+-----------+--------+

Error example. If you use this table:

+-------+------------------------------+-----------+
|  id   |           journal            |    issn   |
+-------+------------------------------+-----------+
| 56201 | ACTA HAEMATOLOGICA           | 0001-5792 |
| 94365 | ACTA PHARMACOLOGICA SINICA   |           |
+-------+------------------------------+-----------+

You will get an ERROR:

ERROR: there is no issn val for group

*t_issn leaved to understand function :))

Upvotes: 1

Related Questions