Chris Mccabe
Chris Mccabe

Reputation: 1951

how to group by with a sql subqueries

I can't think clearly at the moment, I want to return counts by station_id, an example of output would be:

station 1 has 3 fb post, 6 linkedin posts, 5 email posts station 2 has 3 fb post, 6 linkedin posts, 5 email posts

So I need to group by the station id, my table structure is

CREATE TABLE IF NOT EXISTS `posts` (
  `post_id` bigint(11) NOT NULL auto_increment,
  `station_id` varchar(25) NOT NULL,
  `user_id` varchar(25) NOT NULL,
  `dated` datetime NOT NULL,
  `type` enum('fb','linkedin','email') NOT NULL,
  PRIMARY KEY  (`post_id`)
) ENGINE=MyISAM  DEFAULT CHARSET=latin1 AUTO_INCREMENT=x ;

The query I have so far is returning station 0 as having 2 linkedin posts when it has one (2 in the db tho)

SELECT Station_id, (select count(*) FROM posts WHERE type = 'linkedin') AS linkedin_count, (select count(*) FROM posts WHERE type = 'fb') AS fb_count, (select count(*) FROM posts WHERE type = 'email') AS email_count  FROM `posts` GROUP BY station_id;

Upvotes: 4

Views: 7142

Answers (4)

MatBailie
MatBailie

Reputation: 86798

As implied by gnif's answer, having three correlated sub_queries has a performance over-head. Depending on the DBMS you're using, it could perform similarly to having a self join three times.

gnif's methodology ensures that the table is only parsed once, without the need for joins, correlated sub_queries, etc.

The immediately obvious down-side of gnif's answer is that you don't ever get records for 0's. If there are no fb types, you just don't get a record. If that is not an issue, I'd go with his answer. If it is an issue, however, here is a version with similar methodology to gnif, but matching your output format...

SELECT
  station_id,
  SUM(CASE WHEN type = 'linkedin' THEN 1 ELSE 0 END) AS linkedin_count,
  SUM(CASE WHEN type = 'fb'       THEN 1 ELSE 0 END) AS fb_count,
  SUM(CASE WHEN type = 'email'    THEN 1 ELSE 0 END) AS email_count
FROM
  posts
GROUP BY
  station_id

Upvotes: 2

Geoffrey
Geoffrey

Reputation: 11374

Or, the fastest way, avoiding joins and subselects to get it in the exact format you want:

SELECT
  station_id,
  SUM(CASE WHEN type = 'linkedin' THEN 1 ELSE 0 END) AS 'linkedin',
  SUM(CASE WHEN type = 'fb'       THEN 1 ELSE 0 END) AS 'fb',
  SUM(CASE WHEN type = 'email'    THEN 1 ELSE 0 END) AS 'email'
FROM posts
GROUP BY station_id;

Outputs:

+------------+----------+------+-------+
| station_id | linkedin | fb   | email |
+------------+----------+------+-------+
| 1          |        3 |    2 |     5 |
| 2          |        2 |    0 |     1 |
+------------+----------+------+-------+

You may also want to put an index on there to speed it up

ALTER TABLE posts ADD INDEX (station_id, type);

Explain output:

+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
| id | select_type | table | type  | possible_keys | key        | key_len | ref  | rows | Extra       |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+
|  1 | SIMPLE      | posts | index | NULL          | station_id | 28      | NULL |   13 | Using index |
+----+-------------+-------+-------+---------------+------------+---------+------+------+-------------+

Upvotes: 14

Geoffrey
Geoffrey

Reputation: 11374

Give this a go:

SELECT station_id, type, count(*) FROM posts GROUP BY station_id, type

The output format will be a little different to what your attempting to get, but it should provide the statistics your trying to retrieve. Also since its a single query it is much faster.

-- Edit, added example result set

+------------+----------+----------+
| station_id | type     | count(*) |
+------------+----------+----------+
| 1          | fb       |        2 |
| 1          | linkedin |        3 |
| 1          | email    |        5 |
| 2          | linkedin |        2 |
| 2          | email    |        1 |
+------------+----------+----------+

Upvotes: 1

Sascha Galley
Sascha Galley

Reputation: 16091

try this:

SELECT p.Station_id, 
(select count(*) FROM posts WHERE type = 'linkedin' and station_id=p.station_id) AS linkedin_count, 
(select count(*) FROM posts WHERE type = 'fb' and station_id=p.station_id) AS fb_count, 
(select count(*) FROM posts WHERE type = 'email' and station_id=p.station_id) AS email_count 
FROM `posts` p GROUP BY station_id

Upvotes: 0

Related Questions