Roberto
Roberto

Reputation: 729

MERGE BY in SAS - udesirable results with sashelp.class

I want to merge two databases in SAS. I have a task to use MERGE statement for that, although I would prefer to do so using SQL. Nevertheless here is the code I use:

DATA grades;
    INPUT Name $ grades;
    DATALINES;
    Alice 5
    Carol 1
    John 5
    Robert 5
    Thomas 3
    Judy 6
    ;
    PROC SORT DATA=grades;
    BY Name;
RUN;

PROC PRINT DATA=SASHELP.class;
PROC PRINT DATA=grades;

DATA merged;
    MERGE SASHELP.class grades;
    BY Name;
RUN;

PROC PRINT DATA=merged;

The final result I obtain is:

    Obs Name    Sex Age Height  Weight  grades
    1   Alice       .   .       .       5
    2   Carol       .   .       .       1
    3   John        .   .       .       5
    4   Judy        .   .       .       6
    5   Robert      .   .       .       5
    6   Thomas      .   .       .       3
    7   Alfred  M   14  69.0    112.5   .
    8   Alice   F   13  56.5    84.0    .
    9   Barbara F   13  65.3    98.0    .
    10  Carol   F   14  62.8    102.5   .
    11  Henry   M   14  63.5    102.5   .
    12  James   M   12  57.3    83.0    .
    13  Jane    F   12  59.8    84.5    .
    14  Janet   F   15  62.5    112.5   .
    15  Jeffrey M   13  62.5    84.0    .
    16  John    M   12  59.0    99.5    .
    17  Joyce   F   11  51.3    50.5    .
    18  Judy    F   14  64.3    90.0    .
    19  Louise  F   12  56.3    77.0    .
    20  Mary    F   15  66.5    112.0   .
    21  Philip  M   16  72.0    150.0   .
    22  Robert  M   12  64.8    128.0   .
    23  Ronald  M   15  67.0    133.0   .
    24  Thomas  M   11  57.5    85.0    .
    25  William M   15  66.5    112.0   .

If I modify the code by deleting "BY Name" in Merge statement results are not much better:

Obs Name    Sex Age Height  Weight  grades
1   Alice   M   14  69.0    112.5   5
2   Carol   F   13  56.5    84.0    1
3   John    F   13  65.3    98.0    5
4   Judy    F   14  62.8    102.5   6
5   Robert  M   14  63.5    102.5   5
6   Thomas  M   12  57.3    83.0    3
7   Jane    F   12  59.8    84.5    .
8   Janet   F   15  62.5    112.5   .
9   Jeffrey M   13  62.5    84.0    .
10  John    M   12  59.0    99.5    .
11  Joyce   F   11  51.3    50.5    .
12  Judy    F   14  64.3    90.0    .
13  Louise  F   12  56.3    77.0    .
14  Mary    F   15  66.5    112.0   .
15  Philip  M   16  72.0    150.0   .
16  Robert  M   12  64.8    128.0   .
17  Ronald  M   15  67.0    133.0   .
18  Thomas  M   11  57.5    85.0    .
19  William M   15  66.5    112.0   .

Seem s to work until you look at data for Robert for example. The results should be:

Name    Sex Age Height  Weight Grade
Robert  M   12  64.8    128.0  5

not:

Name    Sex Age Height  Weight Grade
Robert  M   14  63.5    102.5   5

It looks that for example Alfred was overwritten. Why data are not consistent? What have I done wrong?

Upvotes: 0

Views: 72

Answers (1)

Richard
Richard

Reputation: 27508

This would appear to be a data issue in which the names in the grade table have one or more leading spaces. The default output styles for HTML and PDF do not make the leading spaces visible. ODS LISTING and view the output in the OUTPUT window should show a leading space.

You can 'fix' the issue with name = left(name); in the grades step. You can force the issue with name = ' ' || left(name);. Check the underlying character data with '$HEX' format.

PROC REPORT data=grades;
  columns name grades name=namehex;
  define namehex / format=$hex20.;
run;

A merge without BY is a special edge case; generally when you know a-priori un-keyed data in table A is related 1:1 row-wise to data in table B.

Upvotes: 3

Related Questions