IceCreamToucan
IceCreamToucan

Reputation: 28685

Adding rows to a dataset when they are missing in the given group

Say I have a dataset like this

  day product sales
1   a       1    48
2   a       2    55
3   a       3    88
4   b       2    33
5   b       3    87
6   c       1    97
7   c       2    95

On day "b" there were no sales for product 1, so there is no row where day = b and product = 1. Is there an easy way to add a row with day = b, product = 1 and sales = 0, and similar "missing" rows to get a dataset like this?

  day product sales
1   a       1    48
2   a       2    55
3   a       3    88
4   b       1     0
5   b       2    33
6   b       3    87
7   c       1    97
8   c       2    95
9   c       3     0

In R you can do complete(df, day, product, fill = list(sales = 0)). I realize you can accomplish this with a self-join in proc sql, but I'm wondering if there is a procedure for this.

Upvotes: 0

Views: 560

Answers (3)

momo1644
momo1644

Reputation: 1804

You can do this with proc freq using the sparse option.

Code:

proc freq data=have noprint;
table day*product /sparse out=freq (drop=percent);
run;

Output:

 day=a product=1 COUNT=1 
 day=a product=2 COUNT=1 
 day=a product=3 COUNT=1 
 day=b product=1 COUNT=0 
 day=b product=2 COUNT=1 
 day=b product=3 COUNT=1 
 day=c product=1 COUNT=1 
 day=c product=2 COUNT=1 
 day=c product=3 COUNT=0 

Upvotes: 1

Reeza
Reeza

Reputation: 21274

In this particular example you can also use the SPARSE option in PROC FREQ. It tells SAS to generate all the complete types with every value from DAY included with PRODUCT, so similar to a cross join between those elements. If you do not have the value in the table already it cannot add the value. You would need a different method in that case.

data have;
input n day $ product sales;
datalines;
1   a       1    48
2   a       2    55
3   a       3    88
4   b       2    33
5   b       3    87
6   c       1    97
7   c       2    95
;;;;
run;

proc freq data=have noprint;
table day*product / out=want sparse;
weight sales;
run;

proc print data=want;run;

Upvotes: 3

Joe
Joe

Reputation: 63424

There are, as usual in SAS, about a dozen ways to do this. Here's my favorite.

data have;
input n day $ product sales;
datalines;
1   a       1    48
2   a       2    55
3   a       3    88
4   b       2    33
5   b       3    87
6   c       1    97
7   c       2    95
;;;;
run;

proc means data=have completetypes;
  class day product;
  types day*product;
  var sales;
  output out=want sum=;
run;

completetypes tells SAS to put out rows for every class combination, including missing ones. You could then use proc stdize to get them to be 0's (if you need them to be 0). It's possible you might be able to do this in the first place with proc stdize, I'm not as familiar unfortunately with that proc.

Upvotes: 3

Related Questions