Reputation: 113
Despite thorough research on my issue, I was not yet able to solve my issue that I encountered with an SQL-query.
I have a two-dimensional query on a customer-database. I would like to sort all purchase amounts by customer and product. Since not all customers purchased every product, it is mandatory to also display the products with zero purchase from every customer.
I have two tables: The big table purchases
with all single transactions and a second table products
, that was generated by a simple query from the purchases
database.
That table is used to list all possible products.
My query looks like the following:
proc sql;
create table customers_products
as select a.customers as customers, b.products as product,sum(a.amount) as amount format=comma16.0, count(a.amount) as transactions format=comma16.0
from products as b
left join work.transactions as a on b.product = a.product
group by a.customers , b.products
order by a.products, amount desc;
quit;
However, there are no zero-counts. Any help is appreciated!
Upvotes: 1
Views: 388
Reputation: 12465
There might be a simpler way to do this, but here is how I would tackle your problem
First: here is some sample purchases. Each customer has a 60% chance of buying a product 1 or more times:
data purchases;
do customer=1 to 10;
do product=1 to 10;
if ranuni(1) > .4 then do;
x = ceil(4*ranuni(1)); /*Buy 1 to 4 times*/
do i=1 to x;
amount = ranuni(1)*10;
output;
end;
end;
end;
end;
drop i x;
run;
Next, I get the distinct products and customers, and I get the totals for each customer/product group.
proc sql noprint;
create table products as
select distinct product from purchases;
create table customers as
select distinct customer from purchases;
create table totals as
select customer, product, sum(amount) as amt
from purchases
group by customer, product;
quit;
Your problem involves a Cartesian product on customers and products -- i.e. you want all customers and products. So I go ahead and do that here.
proc sql noprint;
create table customers_x_products as
select a.customer, b.product
from customers as a,
products as b;
quit;
Finally, I put that customers_x_products table with the totals table to produce the table I think you are looking for:
proc sql noprint;
create table customers_products as
select b.customer,
b.product,
coalesce(a.amt,0) as amt
from customers_x_products as b
left join
totals as a
on a.product=b.product
and a.customer=b.customer
order by b.customer, b.product;
quit;
Upvotes: 1
Reputation: 1792
Does this solve your proplem? Lacking sample data and desired output, I could not test it.
proc sql;
create table customers_products as
select a.customers as customers
,b.products as product
,sum(a.amount) as amount format=comma16.0
,case
when sum(a.amount) is missing then 0
else count(a.amount)
end as transactions format=comma16.0
from products as b
left join work.transactions as a on b.product = a.product
group by a.customers , b.products
order by a.products, amount desc;
quit;
EDIT: I see now that, in your query you have both a product
and a products
variable. Is that a typo?
EDIT 2: I made some sample data and ran your query on it
data products;
infile datalines;
input product $;
datalines;
a
b
c
d
;
run;
data transactions;
infile datalines;
input customers $ product $ amount;
datalines;
X a 3
X b 1
Y c 5
;
run;
proc sql;
create table customers_products as
select a.customers as customers
,b.product as product
,sum(a.amount) as amount format=comma16.0
,count(a.amount) as transactions format=comma16.0
from products as b
left join transactions as a on b.product = a.product
group by a.customers , b.product
order by a.product, amount desc;
quit;
This is your exact query except that I assumed your varying references to prodcuts
and product
were a mistake and I changed them all to product
.
This produces the following result:
customers product amount transactions
-------------------------------------
d . 0
X a 3 1
X b 1 1
Y c 5 1
As you can see, I get a record with zero-value count of transaction for product b
. So I'm not sure I understand what your problem is (again stressing why it is really helpful if questions include sample data and desired output).
Upvotes: 1