lekythos
lekythos

Reputation: 123

Copy a value through column if identifiers are the same

I would like to copy a value if it's identifier can found more than one time. For exemple, here, I'd like to spread the "111" through all the "1" and the "222" through all the "2" in colA

Col A   Col B    Col C
1         111     500
1          -        -
1          -       888
2          -      555
2        222       -
2          -       -

After the process, i'd like to have

Col A   Col B    Col C
1         111     500
1         111     -
1         111     888
2         222     555
2         222     - 
2         222     -

I use SAS at my job but I don't care using something else. However I have 16M of entries so might be better not to export in csv and doing everything in SAS.

Thanks for any help !

Upvotes: 0

Views: 107

Answers (3)

supp
supp

Reputation: 1

Here is a Proc SQL solution that I think meets your requirements

    /** Create dataset **/
data have;
    input A $2. B 4.0   C 3.0  ;
datalines;
1 111 500
1 
1     888
2     555
2 222   
2  
;
run;


proc sql;
    create table want as
        select t1.A, t2.B, t1.C
        from have as t1
        left join (select A, max(B) as B from have group by A) as t2 on t1.A = t2.A;
quit;

Upvotes: 0

Richard
Richard

Reputation: 27516

You can compute the max with Proc MEANS and update the target data set in place.

data have;
input A B C; datalines;
1 111 500
1  .   .
1  .  888
2  .  555
2 222  .
2  .   .
;

proc means nway noprint data=have;
  class a;
  var b;
  output max=b_max out=maxes(drop=_type_ _freq_);
run;

proc sql;
  update have 
  set b=(select b_max from maxes where maxes.a=have.a)
  where missing(b)
  ;

%let syslast = have;

Other ways would rewrite the entire result data set

* rely on Proc SQL automatic remerge feature (SAS only);

proc sql;
  create table want as
  select 
    a,
    case when b is missing then max(b) else b end as b,

/* can also compute with coalesce */
    coalesce(b,max(b)) as b_2nd_way,

    c
  from
    have
  group by
    a;

or you can use DOW loop technique;

data want(drop=_:);
  do until (last.A);            * compute over group loop;
    set have; 
    by A; 
    _max_b = max(_max_b, b);
  end;
  do until (last.A);            * apply over group loop;
    set have; 
    by A;
    if missing(b) then b = _max_b;
    OUTPUT;
  end;
run;

Upvotes: 0

T. Peter
T. Peter

Reputation: 887

I believe SAS is mostly relate to sql-server, so I'll use sql-server query to solve the problem.

select col1,max(col2) over(partition by col1) as col2,col3 from [tableName]

this should grant the expect output. check db<>fiddle

and in SAS I think you need to use this:

proc sql;
select col1,max(col2) over(partition by col1) as col2,col3 from [tableName]
quit;

Upvotes: 2

Related Questions