Enthusiast
Enthusiast

Reputation: 831

Need to understand this query logic

I have three tables. One is user table having columns like user_id, displayname,.... Second table is user_values and its structure like

  -------------------------------------
  | id  | item_id |  field_id | value |
  -------------------------------------
  | 1   |   1     |  15       | 2     |
  -------------------------------------
  | 2   |   1     |  15       | 6     |
  -------------------------------------
  | 3   |   1     |  16       | start |
  -------------------------------------
  | 2   |   2     |  15       | 2    |
  -------------------------------------

In this table item_id is actually the user_id joining to user table. In this field every field can have multiple values against one item_id (user_id). Now I need to find certain values against certain fields of user. I have written the following query which is exactly finding the result I needed.

SELECT 
 `eu`.`user_id`, `eu`.`displayname`, 
 GROUP_CONCAT( CASE WHEN eufv.field_id = 19 THEN eufv.value END ) AS city , 
 GROUP_CONCAT( CASE WHEN eufv.field_id = 15 THEN eufv.value END )AS interests , 
 GROUP_CONCAT( CASE WHEN eufv.field_id = 6 THEN eufv.value END )AS age 
FROM 
 `engine4_users` AS `eu` 
INNER JOIN 
 `engine4_user_fields_values` AS `eufv` 
ON 
 eu.user_id = eufv.item_id 
GROUP BY `eu`.`user_id`

Now there is another table which logs user login history. This table again stores user_id and last_login timestamp. Now I also need user_last login. Now If I write query like

 SELECT 
`eu`.`user_id`, `eu`.`displayname`, 
GROUP_CONCAT( CASE WHEN eufv.field_id = 19 THEN eufv.value END ) AS city , 
GROUP_CONCAT( CASE WHEN eufv.field_id = 15 THEN eufv.value END )AS interests , 
GROUP_CONCAT( CASE WHEN eufv.field_id = 6 THEN eufv.value END )AS age,
    MAX(eul.timestamp) as user_login 
 FROM 
`engine4_users` AS `eu` 
 INNER JOIN 
`engine4_user_fields_values` AS `eufv` 
 ON 
eu.user_id = eufv.item_id 
 Left Join
    engine4_user_logins as eul
  ON
     eu.user_id - eul.user_id
  GROUP BY `eu`.`user_id`

This query returns wrong result. If a user has 7 entries in login table then this query returns city, age and interets values multiplied by 7. For example for item_id 1, field_id 15 it returns 2,2,2,2,2,2,2,6,6,6,6,6,6,6. I don't know why it is returning result like this.

But if I write a sub query to get last login time, like

 SELECT 
`eu`.`user_id`, `eu`.`displayname`, 
GROUP_CONCAT( CASE WHEN eufv.field_id = 19 THEN eufv.value END ) AS city , 
GROUP_CONCAT( CASE WHEN eufv.field_id = 15 THEN eufv.value END ) AS interests , 
GROUP_CONCAT( CASE WHEN eufv.field_id = 6 THEN eufv.value END ) AS age,
(SELECT MAX(eul.timestamp) FROM engine4_user_logins AS eul WHERE eul.user_id = eu.user_id) AS last_login 
   FROM 
   `engine4_users` AS `eu` 
   INNER JOIN 
    `engine4_user_fields_values` AS `eufv` 
   ON 
    eu.user_id = eufv.item_id 
   GROUP BY `eu`.`user_id`

Now this query returns exact result. I first want to ask what is wrong in second query which returns field values wrong. I am totally not getting the point. I don't want sub query.

Please first let me know what is wrong in that query and how can i get exact result without subquery.

Upvotes: 0

Views: 54

Answers (1)

spencer7593
spencer7593

Reputation: 108370

To answer your first question, what's happening is that your query is creating a "cross product", when you have multiple matching rows returned from each of the table.

Each matching row from engine4_user_fields_values is getting matched to each row returned from engine4_user_logins. The resulting set is a cross product of those two sets.

This isn't a bug in SQL, it's the expected behavior.

Similar to the result we get from this demonstration query:

SELECT a.i, b.j
  FROM (SELECT 2 AS i UNION ALL SELECT 3 UNION ALL SELECT 5 UNION ALL SELECT 7) a
  JOIN (SELECT 11 AS j UNION ALL SELECT 13 UNION ALL SELECT 17) b

which produces 12 rows (4 rows x 3 rows)


To answer your second question: There's a couple of approaches to dealing with this. One is to avoid creating the cross product, the other approach is to go ahead and produce the cross product and then eliminate duplicates.


Avoiding the cross product would involve either a separate query, or a single query using an inline view (but the inline view is actually a "subquery", and you said you wanted to avoid that.)

But just for the sake of showing how that could be done using a JOIN operation to an inline view (rather than a correlated subquery), here's an example:

SELECT eu.user_id
     , eu.displayname
     , GROUP_CONCAT( CASE WHEN eufv.field_id = 19 THEN eufv.value END ) AS city 
     , GROUP_CONCAT( CASE WHEN eufv.field_id = 15 THEN eufv.value END ) AS interests 
     , GROUP_CONCAT( CASE WHEN eufv.field_id = 6 THEN eufv.value END ) AS age
     , ll.last_login 
  FROM `engine4_users` eu 
  JOIN `engine4_user_fields_values` eufv
    ON eufv.item_id = eu.user_id
  LEFT
  JOIN ( SELECT eul.user_id
              , MAX(eul.timestamp) AS last_login
           FROM engine4_user_logins eul
          GROUP BY eul.user_id
       ) ll
    ON ll.user_id = eu.user_id
 GROUP BY eu.user_id

The inline view aliased as ll will return at most one row per user_id, so a JOIN to that set won't produce any "duplicates". Performance of the inline view query would be optimized with an appropriate index on (user_id, timestamp).


The other approach is to deal with the "duplicate" values returned from the cross product by eliminating duplicates that are produced. One way to do that is to include the DISTINCT keyword inside the GROUP_CONCAT function. But note that this will remove ALL duplicates, not just duplicates introduced by the cross product.

GROUP_CONCAT(DISTINCT expr)

Note that MySQL may still go through the gyrations of producing the cross product, which could end up being rather large if a user has lots of logins, and there are a lot of rows returned from the other table. And then MySQL would have to go through that whole set to pick out the MAX() and pick out the distinct values.

Upvotes: 1

Related Questions