Reputation: 729
I want to merge two databases in SAS. I have a task to use MERGE statement for that, although I would prefer to do so using SQL. Nevertheless here is the code I use:
DATA grades;
INPUT Name $ grades;
DATALINES;
Alice 5
Carol 1
John 5
Robert 5
Thomas 3
Judy 6
;
PROC SORT DATA=grades;
BY Name;
RUN;
PROC PRINT DATA=SASHELP.class;
PROC PRINT DATA=grades;
DATA merged;
MERGE SASHELP.class grades;
BY Name;
RUN;
PROC PRINT DATA=merged;
The final result I obtain is:
Obs Name Sex Age Height Weight grades
1 Alice . . . 5
2 Carol . . . 1
3 John . . . 5
4 Judy . . . 6
5 Robert . . . 5
6 Thomas . . . 3
7 Alfred M 14 69.0 112.5 .
8 Alice F 13 56.5 84.0 .
9 Barbara F 13 65.3 98.0 .
10 Carol F 14 62.8 102.5 .
11 Henry M 14 63.5 102.5 .
12 James M 12 57.3 83.0 .
13 Jane F 12 59.8 84.5 .
14 Janet F 15 62.5 112.5 .
15 Jeffrey M 13 62.5 84.0 .
16 John M 12 59.0 99.5 .
17 Joyce F 11 51.3 50.5 .
18 Judy F 14 64.3 90.0 .
19 Louise F 12 56.3 77.0 .
20 Mary F 15 66.5 112.0 .
21 Philip M 16 72.0 150.0 .
22 Robert M 12 64.8 128.0 .
23 Ronald M 15 67.0 133.0 .
24 Thomas M 11 57.5 85.0 .
25 William M 15 66.5 112.0 .
If I modify the code by deleting "BY Name" in Merge statement results are not much better:
Obs Name Sex Age Height Weight grades
1 Alice M 14 69.0 112.5 5
2 Carol F 13 56.5 84.0 1
3 John F 13 65.3 98.0 5
4 Judy F 14 62.8 102.5 6
5 Robert M 14 63.5 102.5 5
6 Thomas M 12 57.3 83.0 3
7 Jane F 12 59.8 84.5 .
8 Janet F 15 62.5 112.5 .
9 Jeffrey M 13 62.5 84.0 .
10 John M 12 59.0 99.5 .
11 Joyce F 11 51.3 50.5 .
12 Judy F 14 64.3 90.0 .
13 Louise F 12 56.3 77.0 .
14 Mary F 15 66.5 112.0 .
15 Philip M 16 72.0 150.0 .
16 Robert M 12 64.8 128.0 .
17 Ronald M 15 67.0 133.0 .
18 Thomas M 11 57.5 85.0 .
19 William M 15 66.5 112.0 .
Seem s to work until you look at data for Robert for example. The results should be:
Name Sex Age Height Weight Grade
Robert M 12 64.8 128.0 5
not:
Name Sex Age Height Weight Grade
Robert M 14 63.5 102.5 5
It looks that for example Alfred was overwritten. Why data are not consistent? What have I done wrong?
Upvotes: 0
Views: 72
Reputation: 27508
This would appear to be a data issue in which the names in the grade table have one or more leading spaces. The default output styles for HTML and PDF do not make the leading spaces visible. ODS LISTING
and view the output in the OUTPUT
window should show a leading space.
You can 'fix' the issue with name = left(name);
in the grades
step. You can force the issue with name = ' ' || left(name);
. Check the underlying character data with '$HEX' format.
PROC REPORT data=grades;
columns name grades name=namehex;
define namehex / format=$hex20.;
run;
A merge without BY
is a special edge case; generally when you know a-priori un-keyed data in table A is related 1:1 row-wise to data in table B.
Upvotes: 3