Reputation: 831
I have three tables. One is user table having columns like user_id, displayname,....
Second table is user_values and its structure like
-------------------------------------
| id | item_id | field_id | value |
-------------------------------------
| 1 | 1 | 15 | 2 |
-------------------------------------
| 2 | 1 | 15 | 6 |
-------------------------------------
| 3 | 1 | 16 | start |
-------------------------------------
| 2 | 2 | 15 | 2 |
-------------------------------------
In this table item_id is actually the user_id joining to user table. In this field every field can have multiple values against one item_id (user_id). Now I need to find certain values against certain fields of user. I have written the following query which is exactly finding the result I needed.
SELECT
`eu`.`user_id`, `eu`.`displayname`,
GROUP_CONCAT( CASE WHEN eufv.field_id = 19 THEN eufv.value END ) AS city ,
GROUP_CONCAT( CASE WHEN eufv.field_id = 15 THEN eufv.value END )AS interests ,
GROUP_CONCAT( CASE WHEN eufv.field_id = 6 THEN eufv.value END )AS age
FROM
`engine4_users` AS `eu`
INNER JOIN
`engine4_user_fields_values` AS `eufv`
ON
eu.user_id = eufv.item_id
GROUP BY `eu`.`user_id`
Now there is another table which logs user login history. This table again stores user_id and last_login timestamp. Now I also need user_last login. Now If I write query like
SELECT
`eu`.`user_id`, `eu`.`displayname`,
GROUP_CONCAT( CASE WHEN eufv.field_id = 19 THEN eufv.value END ) AS city ,
GROUP_CONCAT( CASE WHEN eufv.field_id = 15 THEN eufv.value END )AS interests ,
GROUP_CONCAT( CASE WHEN eufv.field_id = 6 THEN eufv.value END )AS age,
MAX(eul.timestamp) as user_login
FROM
`engine4_users` AS `eu`
INNER JOIN
`engine4_user_fields_values` AS `eufv`
ON
eu.user_id = eufv.item_id
Left Join
engine4_user_logins as eul
ON
eu.user_id - eul.user_id
GROUP BY `eu`.`user_id`
This query returns wrong result. If a user has 7 entries in login table then this query returns city, age and interets values multiplied by 7. For example for item_id 1, field_id 15 it returns 2,2,2,2,2,2,2,6,6,6,6,6,6,6
. I don't know why it is returning result like this.
But if I write a sub query to get last login time, like
SELECT
`eu`.`user_id`, `eu`.`displayname`,
GROUP_CONCAT( CASE WHEN eufv.field_id = 19 THEN eufv.value END ) AS city ,
GROUP_CONCAT( CASE WHEN eufv.field_id = 15 THEN eufv.value END ) AS interests ,
GROUP_CONCAT( CASE WHEN eufv.field_id = 6 THEN eufv.value END ) AS age,
(SELECT MAX(eul.timestamp) FROM engine4_user_logins AS eul WHERE eul.user_id = eu.user_id) AS last_login
FROM
`engine4_users` AS `eu`
INNER JOIN
`engine4_user_fields_values` AS `eufv`
ON
eu.user_id = eufv.item_id
GROUP BY `eu`.`user_id`
Now this query returns exact result. I first want to ask what is wrong in second query which returns field values wrong. I am totally not getting the point. I don't want sub query.
Please first let me know what is wrong in that query and how can i get exact result without subquery.
Upvotes: 0
Views: 54
Reputation: 108370
To answer your first question, what's happening is that your query is creating a "cross product", when you have multiple matching rows returned from each of the table.
Each matching row from engine4_user_fields_values
is getting matched to each row returned from engine4_user_logins
. The resulting set is a cross product of those two sets.
This isn't a bug in SQL, it's the expected behavior.
Similar to the result we get from this demonstration query:
SELECT a.i, b.j
FROM (SELECT 2 AS i UNION ALL SELECT 3 UNION ALL SELECT 5 UNION ALL SELECT 7) a
JOIN (SELECT 11 AS j UNION ALL SELECT 13 UNION ALL SELECT 17) b
which produces 12 rows (4 rows x 3 rows)
To answer your second question: There's a couple of approaches to dealing with this. One is to avoid creating the cross product, the other approach is to go ahead and produce the cross product and then eliminate duplicates.
Avoiding the cross product would involve either a separate query, or a single query using an inline view (but the inline view is actually a "subquery", and you said you wanted to avoid that.)
But just for the sake of showing how that could be done using a JOIN operation to an inline view (rather than a correlated subquery), here's an example:
SELECT eu.user_id
, eu.displayname
, GROUP_CONCAT( CASE WHEN eufv.field_id = 19 THEN eufv.value END ) AS city
, GROUP_CONCAT( CASE WHEN eufv.field_id = 15 THEN eufv.value END ) AS interests
, GROUP_CONCAT( CASE WHEN eufv.field_id = 6 THEN eufv.value END ) AS age
, ll.last_login
FROM `engine4_users` eu
JOIN `engine4_user_fields_values` eufv
ON eufv.item_id = eu.user_id
LEFT
JOIN ( SELECT eul.user_id
, MAX(eul.timestamp) AS last_login
FROM engine4_user_logins eul
GROUP BY eul.user_id
) ll
ON ll.user_id = eu.user_id
GROUP BY eu.user_id
The inline view aliased as ll will return at most one row per user_id, so a JOIN to that set won't produce any "duplicates". Performance of the inline view query would be optimized with an appropriate index on (user_id, timestamp).
The other approach is to deal with the "duplicate" values returned from the cross product by eliminating duplicates that are produced. One way to do that is to include the DISTINCT keyword inside the GROUP_CONCAT function. But note that this will remove ALL duplicates, not just duplicates introduced by the cross product.
GROUP_CONCAT(DISTINCT expr)
Note that MySQL may still go through the gyrations of producing the cross product, which could end up being rather large if a user has lots of logins, and there are a lot of rows returned from the other table. And then MySQL would have to go through that whole set to pick out the MAX() and pick out the distinct values.
Upvotes: 1