sum another column's values where the another column is distinct

Question

Hi StackOverFlow members

reports = A table name.

Database

CREATE TABLE `reports` (
  `id` int(11) NOT NULL auto_increment,
  `report_day_name` varchar(20) NOT NULL,
  `report_day` varchar(20) NOT NULL,
  `report_month` varchar(20) NOT NULL,
  `report_year` varchar(20) NOT NULL,
  `report_result_number` varchar(20) NOT NULL,
  `report_result_text` varchar(20) NOT NULL,
  `report_since` varchar(20) NOT NULL,
  `report_date` varchar(20) NOT NULL,
  `catid` int(11) NOT NULL,
  `subjectid` int(11) NOT NULL,
  `userid` int(11) NOT NULL,
  `groupid` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=78 ;

INSERT INTO `reports` VALUES (73, 'day', '14', '1', '1434 h', '5', 'rate', '1234567890', '1434-1-14', 1, 132, 33, 35);
INSERT INTO `reports` VALUES (74, 'day', '12', '2', '1435 h', '4', 'rate', '1234567890', '1434-2-12', 2, 136, 36, 35);
INSERT INTO `reports` VALUES (75, 'day', '14', '1', '1434 h', '2', 'rate', '1354488730', '1434-1-14', 1, 132, 33, 35);
INSERT INTO `reports` VALUES (76, 'day', '12', '2', '1435 h', '4', 'rate', '1354488730', '1434-2-12', 2, 137, 36, 35);
INSERT INTO `reports` VALUES (77, 'day', '12', '2', '1435 h', '1', 'rate', '1354488730', '1434-2-12', 2, 134, 33, 35);

This is the database table:

id  report_result_number    subjectid   userid
73  5                       132         33
74  4                       136         36
75  2                       132         33
76  4                       137         36
77  1                       134         33

I want to SUM(reports.report_result_number) where (reports.subjectid) is DISTINCT

when i run this code..

SELECT
  users.user_id, users.user_name, users.user_country, SUM(reports.report_result_number) AS AllTotal, COUNT(DISTINCT reports.subjectid) AS TotalSubjects
FROM
  users
  INNER JOIN reports ON users.user_id = reports.userid
GROUP BY
  users.user_id
  ORDER BY
  AllTotal DESC LIMIT 4

it returns AllTotal

user_id user_name   user_country    AllTotal    TotalSubjects
36       name         country        8 (correct)        2
33        name        country        8 (not correct)    2

spencer7593 · Accepted Answer

The question is open to a couple of interpretations.

If what you want is the value of result_report_number to be included in the SUM aggregate ONLY if there is only ONE row for a given subjectid and userid, (if there is more than one row for the same subjectid, you want to EXCLUDE the report_result_number for all of those rows...

Then something like this will work:

SELECT u.user_id , u.user_name , u.user_country , SUM(s.report_result_number) AS AllTotal , COUNT(DISTINCT r.subjectid) AS TotalSubjects FROM users u JOIN reports r ON r.userid = u.user_id JOIN ( SELECT d.userid , d.subjectid , d.report_result_number FROM reports d GROUP BY d.userid , d.subjectid HAVING COUNT(1) = 1 ) s ON s.userid = r.userid GROUP BY u.user_id ORDER BY AllTotal DESC LIMIT 4

That's only one (odd) interpretation of the requested result set. Sample data and an expected result set would go a long ways towards clarifying the specification.

For the data you added to your question, this query should be returning, e.g.

36 fee fi   8  2 
33 foo bar  1  2

There are two rows with a subjectid value of 132 for user 33, so the report_result_number for those rows is excluded from the SUM. There are two distinct values for subjectid (132 and 134), so we're returning a :distinct: count of 2.

~~If you are asking for the SUM to return a value ONLY if there are no duplicate values for subjectid for a given user...~~

SELECT u.user_id , u.user_name , u.user_country , IF(COUNT(DISTINCT r.subjectid) = COUNT(r.subjectid) ,SUM(r.report_result_number) ,NULL ) AS AllTotal , COUNT(DISTINCT r.subjectid) AS TotalSubjects FROM users u JOIN reports r ON r.userid = u.user_id GROUP BY u.user_id ORDER BY AllTotal DESC LIMIT 4

Hasan said... "If there are duplicated values [of subjectid for a given userid], get one of them"

Just remove the HAVING clause from the inline view aliased as s. That will return the value of report_result_number for one row. (It will be arbitrary as to which "matching" row the value will be returned from:

SELECT u.user_id
     , u.user_name
     , u.user_country
     , SUM(r.report_result_number) AS AllTotal
     , COUNT(DISTINCT r.subjectid) AS TotalSubjects
  FROM users u
  JOIN ( SELECT d.userid
              , d.subjectid
              , d.report_result_number
           FROM reports d
          GROUP
             BY d.userid
              , d.subjectid
       ) r
    ON r.userid = u.user_id
 GROUP
    BY u.user_id
 ORDER
    BY AllTotal DESC
 LIMIT 4

To make the resultset repeatable, to always get the lowest or highest value, you could add an aggregate function to specify which value to return.

replace...

          , d.report_result_number

with...

          , MAX(d.report_result_number)  AS report_result_number

With the MAX() aggregate, this will return:

36 fee fi   8  2
33 foo bar  6  2

(The query will get the value of '5' for subjectid=132 userid=33, and will omit the value of '2' for the same subjectid.) Absent the MAX aggregate, the query could validly (and arbitrarily) return a '3' in place of the '6'. (It can include either the '5' or the '2', and omit the other.)

SQL Fiddle here

SQL Fiddle including the MAX aggregate here

Q: how can i use (where report_month = 'number') in your code?

A: add the WHERE clause in the inline view, after the FROM clause before the GROUP BY clause. Replace this:

       FROM reports d
      GROUP

with e.g.

       FROM reports d
      WHERE d.report_month = 'number'
      GROUP

Only rows that satisfy the specified predicate will be returned.

sum another column's values where the another column is distinct

Answers (1)

SQL Fiddle here

SQL Fiddle including the MAX aggregate here

Related Questions

sum another column&#39;s values where the another column is distinct

Answers (1)

SQL Fiddle here

SQL Fiddle including the MAX aggregate here

Related Questions

sum another column's values where the another column is distinct