Vsevolod
Vsevolod

Reputation: 23

T-SQL query for online shop products filter

I need to create a stored procedure which returns products characteristics with its products count. Input parameters are:

  1. product category. The query has to return data for this particular category and its child category
  2. selected product characteristics - table valued parameter. If this parameter contains rows then the query has to calculate products count for each characteristic for products which has this particular characteristics too.

Questions:

  1. Database contains 500 000+ product rows, so what would be the best solution (t-sql query) in terms of perfomance?
  2. I've tried to make a query presented below but I think it's ugly and actually doesn't calculate products count correctly. And I need a professional assistance to make a correct query which works as fast as possible

Table scripts and sample data:

--categories
    create table tCategory(c_id int identity(1,1) primary key, c_name nvarchar(200), c_parent int)
    insert into tCategory(c_name, c_parent) select 'Smartphones', null
    insert into tCategory(c_name, c_parent) select 'iPhone 6S', 1
    insert into tCategory(c_name, c_parent) select 'iPhone 7', 1
    insert into tCategory(c_name, c_parent) select 'iPhone 7 Plus', 1
    
    --products
    create table tProduct(p_id int identity(1,1) primary key, p_name nvarchar(200), c_id int)
    insert into tProduct(p_name, c_id) select '4.7" Apple iPhone 6S 32 gb gold', 2
    insert into tProduct(p_name, c_id) select '4.7" Apple iPhone 6S 32 gb brown', 2
    insert into tProduct(p_name, c_id) select '4.7" Apple iPhone 7 32 gb pink', 3
    insert into tProduct(p_name, c_id) select '4.7" Apple iPhone 7 32 gb brown', 3
    insert into tProduct(p_name, c_id) select '5.5" Apple iPhone 7 Plus 32 gb black', 4
    insert into tProduct(p_name, c_id) select '4.7" Apple iPhone 6S 128 gb pink', 2
    
    --characteristics type (color, size etc.)
    create table tProductCharItem(pci_id int identity(1,1) primary key, pci_name nvarchar(200))
    insert into tProductCharItem(pci_name) select 'Display'
    insert into tProductCharItem(pci_name) select 'Color'
    insert into tProductCharItem(pci_name) select 'Memory'
    
    --characteristics value (blue, 50х50 etc.)
    create table tProductCharItemValue(pciv_id int identity(1,1) primary key, pci_id int, pciv_value nvarchar(50))
    insert into tProductCharItemValue(pci_id, pciv_value) select 1, '4.7"'
    insert into tProductCharItemValue(pci_id, pciv_value) select 1, '5.5"'
    insert into tProductCharItemValue(pci_id, pciv_value) select 2, 'gold'
    insert into tProductCharItemValue(pci_id, pciv_value) select 2, 'brown'
    insert into tProductCharItemValue(pci_id, pciv_value) select 2, 'pink'
    insert into tProductCharItemValue(pci_id, pciv_value) select 2, 'black'
    insert into tProductCharItemValue(pci_id, pciv_value) select 3, '32 gb'
    insert into tProductCharItemValue(pci_id, pciv_value) select 3, '128 gb'
    
    --products characteristics
    create table tProductChar(pc_id int identity(1,1) primary key, p_id int, pciv_id int)
    insert into tProductChar(p_id, pciv_id) select 1, 1
    insert into tProductChar(p_id, pciv_id) select 1, 7
    insert into tProductChar(p_id, pciv_id) select 1, 3
    insert into tProductChar(p_id, pciv_id) select 2, 1
    insert into tProductChar(p_id, pciv_id) select 2, 4
    insert into tProductChar(p_id, pciv_id) select 2, 7
    insert into tProductChar(p_id, pciv_id) select 3, 1
    insert into tProductChar(p_id, pciv_id) select 3, 5
    insert into tProductChar(p_id, pciv_id) select 3, 7
    insert into tProductChar(p_id, pciv_id) select 4, 1
    insert into tProductChar(p_id, pciv_id) select 4, 4
    insert into tProductChar(p_id, pciv_id) select 4, 7
    insert into tProductChar(p_id, pciv_id) select 5, 2
    insert into tProductChar(p_id, pciv_id) select 5, 6
    insert into tProductChar(p_id, pciv_id) select 5, 7
    insert into tProductChar(p_id, pciv_id) select 6, 1
    insert into tProductChar(p_id, pciv_id) select 6, 5
    insert into tProductChar(p_id, pciv_id) select 6, 8

Expected result when user didn't select any filter:

+--------+---------+----------+------------+----------------+
| pci_id | pciv_id | pci_name | pciv_value | products_count |
+--------+---------+----------+------------+----------------+
|      1 |       1 | Display  | 4.7"       |              5 |
|      2 |       3 | Color    | gold       |              1 |
|      2 |       4 | Color    | brown      |              2 |
|      2 |       5 | Color    | pink       |              2 |
|      2 |       6 | Color    | black      |              1 |
|      3 |       7 | Memory   | 32 gb      |              5 |
|      3 |       8 | Memory   | 128 gb     |              1 |
|      1 |       2 | Display  | 5.5"       |              1 |
+--------+---------+----------+------------+----------------+

Expected result when user selected filter by color characteristic 'brown'

+--------+---------+----------+------------+----------------+
| pci_id | pciv_id | pci_name | pciv_value | products_count |
+--------+---------+----------+------------+----------------+
|      1 |       1 | Display  | 4.7"       |              2 |
|      2 |       3 | Color    | gold       |              0 |
|      2 |       4 | Color    | brown      |              2 |
|      2 |       5 | Color    | pink       |              0 |
|      2 |       6 | Color    | black      |              0 |
|      3 |       7 | Memory   | 32 gb      |              2 |
|      3 |       8 | Memory   | 128 gb     |              0 |
|      1 |       2 | Display  | 5.5"       |              0 |
+--------+---------+----------+------------+----------------+

Here is my try (ugly and doesn't calculate products count properly):

CREATE TYPE integer_list_tbltype AS TABLE (n int NOT NULL PRIMARY KEY)

declare @c_id int = 1 --category id
declare @pciv_ids integer_list_tbltype --list of selected filters (products characteristics)

insert into @pciv_ids(n) select 4

;with cats as 
(
    select c_id from tCategory where c_id = @c_id
    union all
    select t.c_id from cats 
        inner join tCategory t on cats.c_id = t.c_parent 
),
groupped_pci as (
select distinct p.c_id, pci.pci_id, pciv.pciv_id
from tProductChar pc 
join tProduct p on pc.p_id = p.p_id
join tProductCharItemValue pciv on pc.pciv_id = pciv.pciv_id
join tProductCharItem pci on pciv.pci_id = pci.pci_id),
products_count as (
    select count(distinct p.p_id) cnt, pc.pciv_id
    from tProduct p join tProductChar pc on p.p_id = pc.p_id
    cross apply (select * from tProductChar pc left join @pciv_ids t on pc.pciv_id = t.n where p_id = p.p_id and pc.pciv_id is not null) t
    group by pc.pciv_id
)

select pci.pci_id, pciv.pciv_id, pci.pci_name, pciv.pciv_value, pc.cnt products_count
from groupped_pci
join cats on cats.c_id = groupped_pci.c_id
join tProductCharItem pci on groupped_pci.pci_id = pci.pci_id
join tProductCharItemValue pciv on groupped_pci.pciv_id = pciv.pciv_id
left join products_count pc on groupped_pci.pciv_id = pc.pciv_id

Upvotes: 2

Views: 741

Answers (3)

casenonsensitive
casenonsensitive

Reputation: 950

Here's a TVF that gives you the result - with or without color parameter:

create function tvf_get_products_count(@product_color nvarchar(200))
returns table as
return
with has_attribute_of_given_color as (
select pci.pci_id, pciv.pciv_id, pci.pci_name, pciv.pciv_value, pc.pc_id, pColor.p_id,
       count(pColor.p_id) over (partition by pc.pc_id, pcivColor.pciv_value) as has_attribute_of_given_color 
  from tProductCharItem pci 
  join tProductCharItemValue pciv 
    on pci.pci_id = pciv.pci_id
  join tProductChar pc 
    on pc.pciv_id = pciv.pciv_id
  left join (tProduct pColor 
       join tProductChar pcColor 
         on pColor.p_id = pcColor.p_id 
       join tProductCharItemValue pcivColor 
         on pcColor.pciv_id = pcivColor.pciv_id
       join tProductCharItem pciColor
         on pcivColor.pci_id = pciColor.pci_id
        and pciColor.pci_name = 'Color'
        and pcivColor.pciv_value = isnull(nullif(@product_color, ''), pcivColor.pciv_value))
    on pc.p_id = pColor.p_id
)
select pci_id, pciv_id, pci_name, pciv_value, sum(has_attribute_of_given_color) as products_count 
  from has_attribute_of_given_color
 group by pci_id, pciv_id, pci_name, pciv_value;

Use it like this:

select * from tvf_get_products_count(null) order by 1, 2;
select * from tvf_get_products_count('brown') order by 1, 2;

In order to get these results: Results of TVFs with or without color parameter

From the performance point of view this one could be the most promising. It is necessary to join twice all of the attributes - in contrast to my other answer this is achieved with an outer join that filters only on color attributes instead of using subqueries. The result is then grouped again on the desired level.

Upvotes: 1

casenonsensitive
casenonsensitive

Reputation: 950

Here's a query to find your results without the filter:

select pci.pci_id, pciv.pciv_id, pci.pci_name, pciv.pciv_value,
       count(*)
  from tProductCharItem pci 
  join tProductCharItemValue pciv on pci.pci_id = pciv.pci_id
  join tProductChar pc on pc.pciv_id = pciv.pciv_id
 group by pci.pci_id, pciv.pciv_id, pci.pci_name, pciv.pciv_value
 order by 1, 2

Here's a query to find your result with the filter on brown:

select distinct pci.pci_id, pciv.pciv_id, pci.pci_name, pciv.pciv_value,
  sum(case when exists (select null 
                          from tProduct pColor 
                          join tProductChar pcColor 
                            on pColor.p_id = pcColor.p_id 
                          join tProductCharItemValue pcivColor 
                            on pcColor.pciv_id = pcivColor.pciv_id 
                           and pciv_value = 'brown' 
                         where pc.p_id = pColor.p_id)
      then 1 
      else 0 
      end) 
  over (partition by pci.pci_id, pciv.pciv_id, pci.pci_name, pciv.pciv_value)
  from tProductCharItem pci 
  join tProductCharItemValue pciv on pci.pci_id = pciv.pci_id
  join tProductChar pc on pc.pciv_id = pciv.pciv_id
 order by 1, 2

The first one should be pretty clear, so I'll explain what happens in the second case: I'm using a window function/analytical function sum(...) over (partition by ...) and then a distinct over all in order to achieve the same as the group by. I had to do it that way, otherwise we'd get the error Cannot perform an aggregate function on an expression containing an aggregate or a subquery.. The style with the distinct could lead to slightly different result, but that is not the case here.

In the window function I'm using a case when in order to simulate a count(*) with a given condition. The condition being exists (select null ...). That condition checks if the given row is a row of a brown product. So if that's the case, the exists will be true, the case when will be 1 and it will sum up.

As has been pointed out by @Mitz, the value for 'brown' is a constant. Should you really want to query it color by color, then replace 'brown' with a variable and put it all in a TVF or stored procedure depending on your need. It is also possible to create such a query that returns the values for all of the colors.

Upvotes: 1

Mitz
Mitz

Reputation: 135

This is what i would do to avoid subqueries:

declare @filters table (pciv_id int)
insert into @filters(pciv_id) values (4)

if exists (select * from @filters)
    set @hasFilter=1

;with cats as 
(
    select c_id from tCategory where c_id = @c_id
    union all
    select t.c_id from cats 
    inner join tCategory t on cats.c_id = t.c_parent 
)
, filteredProducts as 
(
    select p.p_id
    from cats
    inner join tProduct p on p.c_id=cats.c_id
    inner join tProductChar pc on pc.p_id=p.p_id
    inner join tProductCharItemValue pcv on pcv.pciv_id=pc.pciv_id
    left join @filters f on f.pciv_id=pc.pciv_id
    where @hasFilter=0 or f.pciv_id is not null
    group by p.p_id
)
select ci.pci_id, pcv.pciv_id, ci.pci_name, pcv.pciv_value, count(p.p_id) products_count
from tProductCharItem ci
inner join tProductCharItemValue pcv on pcv.pci_id=ci.pci_id
left join tProductChar pc on pc.pciv_id=pcv.pciv_id
left join filteredProducts p on p.p_id=pc.p_id
group by ci.pci_id, pcv.pciv_id, ci.pci_name, pcv.pciv_value

Upvotes: 1

Related Questions