Carol.Kar
Carol.Kar

Reputation: 5215

Filter group-by based on records per group

I am using mysql Ver 8.0.23-0ubuntu0.20.04.1 for Linux on x86_64 ((Ubuntu)).

I have created the following database:

CREATE TABLE `transaction` (
  `id` bigint unsigned NOT NULL AUTO_INCREMENT,
  `company` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `filling_date` timestamp NULL DEFAULT NULL,
  `trx_type` varchar(191) COLLATE utf8mb4_unicode_ci DEFAULT NULL,
  `amount_range` int DEFAULT NULL,
  `insider_price` decimal(30,4) DEFAULT NULL,
  `qty` int DEFAULT NULL,
  `transaction_value` decimal(30,4) DEFAULT NULL,
  PRIMARY KEY (`id`)
); 

INSERT INTO transaction (id, company, filling_date, trx_type, amount_range, insider_price, qty, transaction_value) VALUES(6, 'Apple', '2021-06-06 16:39:54.000', 'P - Purchase', 1000, 10, 100, 8000);
INSERT INTO transaction (id, company, filling_date, trx_type, amount_range, insider_price, qty, transaction_value) VALUES(7, 'Apple', '2021-06-05 15:29:34.000', 'S - Sale', 2000, 11, 200, 9000);
INSERT INTO transaction (id, company, filling_date, trx_type, amount_range, insider_price, qty, transaction_value) VALUES(8, 'Microsoft', '2021-06-05 11:22:15.000', 'P - Purchase', 2000, 10, 500, 1000);
INSERT INTO transaction (id, company, filling_date, trx_type, amount_range, insider_price, qty, transaction_value) VALUES(9, 'Apple', '2021-05-16 11:29:44.000', 'P - Purchase', 1000, 11, 1000, 10000);
INSERT INTO transaction (id, company, filling_date, trx_type, amount_range, insider_price, qty, transaction_value) VALUES(10, 'Microsoft', '2021-01-10 11:22:15.000', 'P - Purchase', 100, 30, 700, 3000);
INSERT INTO transaction (id, company, filling_date, trx_type, amount_range, insider_price, qty, transaction_value) VALUES(11, 'Microsoft', '2021-06-01 22:22:15.000', 'S - Sale', 6000, 60, 600, 4000);

I would like to filter my group-by statement so that I get all records per group by > 1 (count(Company) > 1).

I tried the following:

select 
  count(Company) AS RecordsPerGroup,
  Company, 
  max(filling_date) Last_filling_date, 
  trx_type,
  sum(amount_range) amount_range,
  sum(insider_price) insider_price,
  sum(qty) qty,
  sum(transaction_value) transaction_value
from transaction
where filling_date >= DATE(NOW()) - INTERVAL 31 DAY 
AND trx_type ='p - purchase'
AND count(Company) > 1
group by company

When I run the above query I get:

Query Error: Error: ER_INVALID_GROUP_FUNC_USE: Invalid use of group function

The error stems from the following statement AND count(Company) > 1, which is not allowed in my query.

Find below my db-fiddle example:

DB-Fiddle Example

I would like to get the following output:

| RecordsPerGroup | Last_filling_date   | amount_range | insider_price | qty  | transaction_value | Company   | trx_type     |
| --------------- | ------------------- | ------------ | ------------- | ---- | ----------------- | --------- | ------------ |
| 2               | 2021-06-06 16:39:54 | 2000         | 21.0000       | 1100 | 18000.0000        | Apple     | P - Purchase |

Any suggestions how to filter my group-by statements to display only all records that have >2 group bys?

I appreciate your replies!

Upvotes: 1

Views: 52

Answers (2)

Tim Biegeleisen
Tim Biegeleisen

Reputation: 522817

The assertion on the count belongs in a HAVING clause, not the WHERE clause, and also if you intend to select trx_type, then this column should also appear in the GROUP BY clause:

SELECT
    COUNT(Company) RecordsPerGroup,
    Company, 
    MAX(filling_date) Last_filling_date,
    trx_type,
    SUM(amount_range) amount_range,
    SUM(insider_price) insider_price,
    SUM(qty) qty,
    SUM(transaction_value) transaction_value
FROM `transaction`
WHERE
    filling_date >= DATE(NOW()) - INTERVAL 31 DAY AND
    trx_type = 'p - purchase' 
GROUP BY
    Company,
    trx_type
HAVING
    COUNT(Company) > 1;

Upvotes: 1

Craig
Craig

Reputation: 1226

When you're using GROUP BY, and you want to treat an aggregation as part of the overall "filter" criteria, use HAVING.

Eg.

SELECT
  count(Company) AS RecordsPerGroup,
  Company, 
  max(filling_date) Last_filling_date, 
  trx_type,
  sum(amount_range) amount_range,
  sum(insider_price) insider_price,
  sum(qty) qty,
  sum(transaction_value) transaction_value
from transaction
where filling_date >= DATE(NOW()) - INTERVAL 31 DAY 
AND trx_type ='p - purchase'
group by company
HAVING COUNT(Company) > 1

See the MySql docs for more reference https://www.mysqltutorial.org/mysql-having.aspx

The HAVING clause, like the WHERE clause, specifies selection conditions. The WHERE clause specifies conditions on columns in the select list, but cannot refer to aggregate functions. The HAVING clause specifies conditions on groups, typically formed by the GROUP BY clause. The query result includes only groups satisfying the HAVING conditions. (If no GROUP BY is present, all rows implicitly form a single aggregate group.)

The HAVING clause is applied nearly last, just before items are sent to the client, with no optimization. (LIMIT is applied after HAVING.)

The SQL standard requires that HAVING must reference only columns in the GROUP BY clause or columns used in aggregate functions. However, MySQL supports an extension to this behavior, and permits HAVING to refer to columns in the SELECT list and columns in outer subqueries as well.

https://dev.mysql.com/doc/refman/8.0/en/select.html

Upvotes: 1

Related Questions