Reputation: 167
I have a table in SAS and it looks like this: The primary key is Name-Surname.
Row Name Surname Country Sec Salary
1 Foo Bar SP 1 1500
2 Foo Bar SP 2
3 Foo Bar 3 1500
4 Foo1 Bar1 1 2000
5 Foo1 Bar1 IT 2 2000
6 Foo1 Bar1 IT 3 2000
7 Foo1 Bar1 IT 4
8 Foo2 Bar2 PO 1
8 Foo2 Bar2 2 850
9 Foo2 Bar2 3
10 Foo2 Bar2 PO 4
It has empty fields, how can I fill it so that they are as in the table below?
Row Name Surname Country Sec Salary
1 Foo Bar SP 1 1500
2 Foo Bar SP 2 1500
3 Foo Bar SP 3 1500
4 Foo1 Bar1 IT 1 2000
5 Foo1 Bar1 IT 2 2000
6 Foo1 Bar1 IT 3 2000
7 Foo1 Bar1 IT 4 2000
8 Foo2 Bar2 PO 1 850
8 Foo2 Bar2 PO 2 850
9 Foo2 Bar2 PO 3 850
10 Foo2 Bar2 PO 4 850
Thank you.
Upvotes: 0
Views: 75
Reputation: 27498
A DOW loop can to process the by groups to identify the 1st non-missing value which is then to be used as the imputation value.
data have; input
Row Name $ Surname $ Country $ Sec Salary; datalines;
1 Foo Bar SP 1 1500
2 Foo Bar SP 2 .
3 Foo Bar . 3 1500
4 Foo1 Bar1 . 1 2000
5 Foo1 Bar1 IT 2 2000
6 Foo1 Bar1 IT 3 2000
7 Foo1 Bar1 IT 4 .
8 Foo2 Bar2 PO 1 .
8 Foo2 Bar2 . 2 850
9 Foo2 Bar2 . 3 .
10 Foo2 Bar2 PO 4 .
;
data want;
do _n_ = 1 by 1 until (last.surname);
set
have (obs=0 rename=(country=_1st_country salary=_1st_salary))
have
;
by name surname;
if missing(_1st_country) then if not missing(country) then _1st_country = country;
if missing(_1st_salary ) then if not missing(salary ) then _1st_salary = salary;
end;
do _n_ = 1 to _n_;
set have;
if missing(country) then country = _1st_country;
if missing(salary ) then salary = _1st_salary;
OUTPUT;
end;
drop _1st:;
run;
Upvotes: 1
Reputation: 3845
Assuming your data is sorted by Name and Surname and you want to take over values only from lines with the same name and surname, For each combination of name and surname read in all data twice.
data want;
set have (in=first_visit) have (in=second_visit);
by Name Surname;
The first visit remember the Country and Salary from the lines it is filled in. In case different non missing values exist, put a warning in the log.
if first_visit then do;
if first.Surname then do;
_Country = Country;
_Salary = Salary;
end;
else do;
if missing(_Country) then _Country = Country;
else if _Country ne Country and not missing(Country) then put
'WARNING: different values:' Country= ' and ' _Country
' for ' Name= Surname=;
if missing(_Salary) then _Salary = Salary;
else if _Salary ne Salary and not missing(Salary) then put
'WARNING: different values:' Salary= ' and ' _Salary
' for ' Name= Surname=;
end;
end;
The second visit, fill in the blanks with the values retained from the first visit. (Note that we don't need the variable second_visit
, but it is easier to understand if I define it anyway.)
else do; * this is the _second_visit ;
if missing(Country) then Country = _Country;
if missing(Salary) then Salary = _Salary;
end;
To make this work, we must explicitly retain the temporary values, because SAS initialises all variables for each observation by default. (I started all their names with _
, because I can then refer them with a wildcard, but that only works if you put the retain statement after the creation of the variables.)
retain _:;
As the retained values have no further use, drop them from the result. _(Note that first_visit and second_visit are also dropped, because of the way we defined them.)_
drop _:;
run;
Upvotes: 1