user1052096
user1052096

Reputation: 883

MYSQL group by and inner join

I have an article table which holds the number of articles views for each day. A new record is created to hold the count for each seperate day for each article.

The query below gets the article id and total views for the top 5 viewed article id for all time :

SELECT article_id, 
SUM(article_count) as cnt
FROM article_views
GROUP BY article_id
ORDER BY cnt DESC
LIMIT 5 

I also have a seperate article table which holds all the article fields. I want to ammend the query above to join to the article table and get two fields for each article id. I have tried to do this below but count is comming back incorrectly :

SELECT article_views.article_id, SUM( article_views.article_count ) AS cnt, articles.article_title, articles.artcile_url
FROM article_views
INNER JOIN articles ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id
ORDER BY cnt DESC
LIMIT 5

Im not sure exactly what im doing wrong. Do I need to do a subquery?

Upvotes: 10

Views: 39569

Answers (3)

spencer7593
spencer7593

Reputation: 108400

Your query looks basically right to me...

But the value returned for cnt is going to be dependent upon article_id column being UNIQUE in the articles table. We'd assume that it's the primary key, and absent a schema definition, that's only an assumption.)

Also, we're likely to assume there's a foreign key between the tables, that is, there are no values of article_id in the articles_view table which don't match a value of article_id on a row from the articles table.


To check for "orphan" article_id values, run a query like:

SELECT v.article_id
  FROM articles_view v
  LEFT
  JOIN articles a
    ON a.article_id = v.article_id
 WHERE a.article_id IS NULL

To check for "duplicate" article_id values in articles, run a query like:

SELECT a.article_id
  FROM articles a
 GROUP BY a.article_id
HAVING COUNT(1) > 1 

If either of those queries returns rows, that could be an explanation for the behavior you observe.

Upvotes: 0

Mahmoud Gamal
Mahmoud Gamal

Reputation: 79929

Add articles.article_title, articles.artcile_url to the GROUP BY clause:

SELECT 
  article_views.article_id, 
  articles.article_title, 
  articles.artcile_url,
  SUM( article_views.article_count ) AS cnt
FROM article_views
INNER JOIN articles ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id,   
         articles.article_title, 
         articles.artcile_url
ORDER BY cnt DESC
LIMIT 5;

The reason you were not getting correct result set, is that when you select rows that are not included in the GROUP BY nor in an aggregate function in the SELECT clause MySQL picks up random value.

Upvotes: 16

Gordon Linoff
Gordon Linoff

Reputation: 1269743

You are using a MySQL (mis) feature called Hidden Columns, because article title is not in the group by. However, this may or may not be causing your problem.

If the counts are wrong, then I think you have duplicate article_id in the article table. You can check this by doing:

select article_id, count(*) as cnt
from articles
group by article_id
having cnt > 1

If any appear, then that is your problem. If they all have different titles, then grouping by the title (as suggested by Mahmoud) would fix the problem.

If not, one way to fix it is the following:

SELECT article_views.article_id, SUM( article_views.article_count ) AS cnt, articles.article_title, articles.artcile_url
FROM article_views INNER JOIN
     (select a.* from articles group by article_id) articles
     ON articles.article_id = article_views.article_id
GROUP BY article_views.article_id
ORDER BY cnt DESC
LIMIT 5

This chooses an abitrary title for the article.

Upvotes: 3

Related Questions