manxing
manxing

Reputation: 3335

mysql SELECT JOIN and GROUP BY

Here is my query:

SELECT v2.mac, v2.userag_hash, v2.area, count(*), count(distinct v2.video_id) 
FROM video v2 JOIN (
  SELECT distinct v.mac, v.userag_hash 
  from video v 
  WHERE v.date_pl >= '2012-01-30 00:00' AND 
    v.date_pl <= '2012-02-05 23:55' 
  ORDER BY rand() LIMIT 50
) table2 
ON v2.mac = table2.mac AND 
  v2.userag_hash = table2.userag_hash AND 
  v2.date_pl >= '2012-01-30 00:00' AND 
  v2.date_pl <= '2012-02-05 23:55' 
GROUP BY v2.mac, v2.userag_hash

I have one table "video" in the database, it contains several thousand users' data, now I want to randomly select 50 users and calculate based on the selected rows, (each user is identified by unique combination of ), this query's result is:

 usermac1, userag_hash1, area1, 10, 5
 usermac2, userag_hash2, area2, 20, 8
  ...

But if I don't use "GROUP BY" in the end of the query, then it will return only one row: usermac, userag_hash, areax, 1500, 700 (don't know what this row stands for)

I am wondering if the "1500, 700" is the sum of the last two columns of the previous results. like 1500 = 10+20+... 700 = 5+8+...

Upvotes: 0

Views: 287

Answers (1)

Peter
Peter

Reputation: 3417

Based on the fact that you have only one aggregate function (count) and used on 2 columns, and you can run it without GROUP BY at all, you must be using the non-standards compliant MySQL.

SELECT v2.mac, v2.userag_hash, v2.area, count(*), count(distinct v2.video_id) 
...

Whatever your data is, MySQL will return one row when you use aggregate functions, which is:

<undefined value>, <undefined value>, count of all rows, count of rows where v2.video_id is distinct (and probably non null).

So I think you have 1500 rows, and 700 distinct values of v2.video_id, or 700 non-null distinct values. To test this null idea, try:

count(distinct IFNULL(v2.video_id,'nullvaluehere'))

which will convert nulls to non-null so they will be included.

The "undefined values" could be the first row, last row, first where something is non null, first in an index, first in some cache, etc. There is no definition of what should happen when you write an invalid query.

Every SQL database I'm aware of other than MySQL will give you an error message and not even run the query. For the query to be valid, it must have all non-aggregated columns in the group by. eg. mac and userag_hash must both be in group by.

Upvotes: 1

Related Questions