Reputation: 728

select only a few columns from a large table in SAS

I have to join 2 tables on a key (say XYZ). I have to update one single column in table A using a coalesce function. Coalesce(a.status_cd, b.status_cd).

TABLE A:
contains some 100 columns. KEY Columns ABC.

TABLE B: Contains just 2 columns. KEY Column ABC and status_cd

TABLE A, which I use in this left join query is having more than 100 columns. Is there a way to use a.* followed by this coalesce function in my PROC SQL without creating a new column from the PROC SQL; CREATE TABLE AS ... step?

Thanks in advance.

Upvotes: 0

Answers (2)

Tom

Reputation: 51621

You can take advantage of dataset options to make it so you can use wildcards in the select statement. Note that the order of the columns could change doing this.

proc sql ;
  create table want as
    select a.*
         , coalesce(a.old_status,b.status_cd) as status_cd
    from tableA(rename=(status_cd=old_status)) a
    left join tableB b
      on a.abc = b.abc
  ;
quit;

Upvotes: 1

user667489

Reputation: 9569

I eventually found a fairly simple way of doing this in proc sql after working through several more complex approaches:

proc sql noprint;
  update master a
  set status_cd= coalesce(status_cd,
                           (select status_cd
                            from transaction b
                            where a.key= b.key))
  where exists (select 1
                from transaction b
                where a.ABC = b.ABC);
quit;

This will update just the one column you're interested in and will only update it for rows with key values that match in the transaction dataset.

Earlier attempts:

The most obvious bit of more general SQL syntax would seem to be the update...set...from...where pattern as used in the top few answers to this question. However, this syntax is not currently supported - the documentation for the SQL update statement only allows for a where clause, not a from clause.

If you are running a pass-through query to another database that does support this syntax, it might still be a viable option.

Alternatively, there is a way to do this within SAS via a data step, provided that the master dataset is indexed on your key variable:

/*Create indexed master dataset with some missing values*/
data master(index = (name));
  set sashelp.class;
  if _n_ <= 5 then call missing(weight);
run;

/*Create transaction dataset with some missing values*/
data transaction;
  set sashelp.class(obs = 10 keep = name weight);
  if _n_ > 5 then call missing(weight);
run;

data master;
  set transaction;
  t_weight = weight;
  modify master key = name;
  if _IORC_ = 0 then do;
      weight = coalesce(weight, t_weight);
      replace;
  end;
  /*Suppress log messages if there are key values in transaction but not master*/  
  else _ERROR_ = 0; 
run;

A standard warning relating to the the modify statement: if this data step is interrupted then the master dataset may be irreparably damaged, so make sure you have a backup first.

In this case I've assumed that the key variable is unique - a slightly more complex data step is needed if it isn't.

Another way to work around the lack of a from clause in the proc sql update statement would be to set up a format merge, e.g.

data v_format_def /view = v_format_def;
    set transaction(rename = (name = start weight = label));
    retain fmtname 'key' type 'i';
    end = start;
run;

proc format cntlin = v_format_def; run;

proc sql noprint;
    update master 
        set weight = coalesce(weight,input(name,key.))
        where master.name in (select name from transaction);
run;

In this scenario I've used type = 'i' in the format definition to create a numeric informat, which proc sql uses convert the character variable name to the numeric variable weight. Depending on whether your key and status_cd columns are character or numeric you may need to do this slightly differently.

This approach effectively loads the entire transaction dataset into memory when using the format, which might be a problem if you have a very large transaction dataset. The data step approach should hardly use any memory as it only has to load 1 row at a time.

Upvotes: 0

select only a few columns from a large table in SAS

Answers (2)

Related Questions