GitCrush
GitCrush

Reputation: 113

SQL in SAS: Trying to count zero values by left-join

Despite thorough research on my issue, I was not yet able to solve my issue that I encountered with an SQL-query.

I have a two-dimensional query on a customer-database. I would like to sort all purchase amounts by customer and product. Since not all customers purchased every product, it is mandatory to also display the products with zero purchase from every customer.

I have two tables: The big table purchases with all single transactions and a second table products, that was generated by a simple query from the purchases database.

That table is used to list all possible products.

My query looks like the following:

proc sql; 
create table customers_products 
as select  a.customers as customers, b.products as product,sum(a.amount) as amount format=comma16.0, count(a.amount) as transactions format=comma16.0
from products as b 
left join work.transactions as a on b.product = a.product
group by a.customers , b.products 
order by a.products, amount desc; 
quit;  

However, there are no zero-counts. Any help is appreciated!

Upvotes: 1

Views: 388

Answers (2)

DomPazz
DomPazz

Reputation: 12465

There might be a simpler way to do this, but here is how I would tackle your problem

First: here is some sample purchases. Each customer has a 60% chance of buying a product 1 or more times:

data purchases;
do customer=1 to 10;
    do product=1 to 10;
        if ranuni(1) > .4 then do;
            x = ceil(4*ranuni(1)); /*Buy 1 to 4 times*/
            do i=1 to x;
                amount = ranuni(1)*10;
                output;
            end;
        end;
    end;
end;
drop i x;
run;

Next, I get the distinct products and customers, and I get the totals for each customer/product group.

proc sql noprint;
create table products as
select distinct product from purchases;

create table customers as
select distinct customer from purchases;

create table totals as
select customer, product, sum(amount) as amt
from purchases
group by customer, product;
quit;

Your problem involves a Cartesian product on customers and products -- i.e. you want all customers and products. So I go ahead and do that here.

proc sql noprint;
create table customers_x_products as 
select a.customer, b.product
from customers as a, 
     products as b;
quit;

Finally, I put that customers_x_products table with the totals table to produce the table I think you are looking for:

proc sql noprint;
create table customers_products as
select b.customer,
       b.product,
       coalesce(a.amt,0) as amt
    from customers_x_products as b
      left join
         totals as a
      on a.product=b.product 
      and a.customer=b.customer
    order by b.customer, b.product;
quit;

Upvotes: 1

user2877959
user2877959

Reputation: 1792

Does this solve your proplem? Lacking sample data and desired output, I could not test it.

proc sql; 
create table customers_products as
select a.customers as customers
      ,b.products as product
      ,sum(a.amount) as amount format=comma16.0
      ,case
        when sum(a.amount) is missing then 0
        else count(a.amount) 
       end as transactions format=comma16.0
from products as b 
left join work.transactions as a on b.product = a.product
group by a.customers , b.products 
order by a.products, amount desc; 
quit; 

EDIT: I see now that, in your query you have both a product and a products variable. Is that a typo?

EDIT 2: I made some sample data and ran your query on it

data products;
infile datalines;
input product $;
datalines;
a
b
c
d
;
run;
data transactions;
infile datalines;
input customers $ product $ amount;
datalines;
X a 3
X b 1
Y c 5
;
run;

proc sql; 
create table customers_products as
select a.customers as customers
            ,b.product as product
            ,sum(a.amount) as amount format=comma16.0
            ,count(a.amount) as transactions format=comma16.0
from products as b 
left join transactions as a on b.product = a.product
group by a.customers , b.product 
order by a.product, amount desc; 
quit;

This is your exact query except that I assumed your varying references to prodcuts and product were a mistake and I changed them all to product. This produces the following result:

customers product amount transactions
-------------------------------------
          d       .      0
X         a       3      1
X         b       1      1
Y         c       5      1

As you can see, I get a record with zero-value count of transaction for product b. So I'm not sure I understand what your problem is (again stressing why it is really helpful if questions include sample data and desired output).

Upvotes: 1

Related Questions