Complicated sql query with variables

Question

That's my query to get first $count rows for each city/subcategory combination

$contacts = $dbh->prepare("
    SELECT *
    FROM (SELECT c.*,
                 (@rn := IF(@cc = CONCAT_WS(':', city_id, subcategory_id), @rn + 1,
                            IF(@cc := CONCAT_WS(':', city_id, subcategory_id), 1, 1)
                           )
                 ) as rn
          FROM (SELECT reg.title as region_title, cnt.title, cnt.city_id, cnt.id, cnt.catalog_id, cnt.address, cnt.phone, cnt.email, cnt.website,  cnt.subcategory_title, cnt.subcategory_id, cnt.manufacturer 
                FROM contacts as cnt
                LEFT JOIN regions as reg 
                ON cnt.city_id = reg.id 
                WHERE city_id IN (".implode(',', $regions).") AND 
                      subcategory_id IN (".implode(',', $categories).") 
                ORDER BY subcategory_title, city_id, title
               ) c CROSS JOIN
               (SELECT @cc := '', @rn := 0) params
         ) c
    WHERE rn <= $count");

And i'm using $contacts->fetchAll(PDO::FETCH_GROUP); to group rows by reg.title

[ 
 ['City 1'] = > [ 
   [ contact 1 ],
   [ contact 2 ],
   ...
 ],
 ['City 2'] = > [ 
   [ contact 3 ],
   [ contact 4 ],
   ...
 ]
 ...
]

Now I need to upgrade that query but it's too complicated for me :( Selected rows must have unique contacts.catalog_id value.
How it can be done?

UPD
Here is a demo database - http://sqlfiddle.com/#!9/ac71d7/2

spencer7593 · Accepted Answer

"We need unique catalog_id globally"

To identify unique values of catalog_id in contacts, we could use a query like this:

   SELECT r.catalog_id
     FROM contacts r
    GROUP BY r.catalog_id
   HAVING COUNT(1) = 1

That says, for a given row in contacts, if the value of catalog_id matches catalog_id on any other row in contacts, that catalog_id will be excluded from the result.

If we want to restrict the original query to returning only those values of catalog_id, we could include this query as an inline view, and join that to rows in contacts with matching catalog_id.

                    FROM contacts cnt
  -- ------------
                    JOIN ( SELECT r.catalog_id
                             FROM contacts r
                            GROUP BY r.catalog_id
                           HAVING COUNT(1) = 1
                         ) s
                      ON s.catalog_id = cnt.catalog_id
  -- ------------
                    LEFT
                    JOIN regions reg
                      ON reg.id = cnt.city_id

EDIT

If the specification is interpreted differently, instead of meaning catalog_id must be unique in contacts, we mean that a catalog_id should not be repeated in the result... we can use the same approach, but get single value of id from contacts for each catalog_id. We could write a query like this:

   SELECT MAX(r.id) AS max_id
        , r.catalog_id
     FROM contacts r
    GROUP BY r.catalog_id

We could use MIN() aggregate in place of MAX(). The goal is to return a single contacts.id for each discrete value of catalog_id.

We can incorporate that into the query as an inline view, matching max_id from the inline view to the id from contacts table.

Something like this:

                    FROM contacts cnt
  -- ------------
                    JOIN ( SELECT MAX(r.id) AS max_id
                             FROM contacts r
                            WHERE ... 
                            GROUP BY r.catalog_id
                         ) s
                      ON s.max_id = cnt.id
  -- ------------
                    LEFT
                    JOIN regions reg
                      ON reg.id = cnt.city_id

We probably want to move the conditions in the WHERE clause of the outer query into that inline view. If we don't, then the max_id returned by the inline view might reference an row (id) in contacts that doesn't satisfy the conditions in the WHERE clause.

Relocating the WHERE conditions on cnt into the inline view ...

SELECT d.*
  FROM ( SELECT c.*
              , ( @rn := IF( @cc = CONCAT_WS(':', city_id, subcategory_id)
                           , @rn + 1
                           , IF( @cc := CONCAT_WS(':', city_id, subcategory_id),1,1)
                         )
                ) AS rn
           FROM ( SELECT reg.title AS region_title
                       , cnt.title
                       , cnt.city_id
                       , cnt.id
                       , cnt.catalog_id
                       , cnt.address
                       , cnt.phone
                       , cnt.email
                       , cnt.website
                       , cnt.category_title
                       , cnt.subcategory_title
                       , cnt.subcategory_id
                       , cnt.manufacturer
                    FROM contacts cnt
  -- --------------
                    JOIN ( SELECT MAX(r.id) AS max_id
                             FROM contacts r
                            WHERE r.city_id        IN ( ... ) 
                              AND r.subcategory_id IN ( ... )
                              AND r.email          IS NOT NULL
                              AND r.manufacturer   = 1
                            GROUP BY r.catalog_id
                         ) s
                      ON s.max_id = cnt.id
  -- --------------
                    LEFT
                    JOIN regions reg
                      ON reg.id = cnt.city_id
                   ORDER
                      BY cnt.category_title
                       , cnt.subcategory_title
                       , cnt.city_id
                       , cnt.title
                ) c
          CROSS
           JOIN ( SELECT @cc := '', @rn := 0) i
       ) d
 WHERE d.rn <= 10

Complicated sql query with variables

Answers (1)

Related Questions