HasanAlyazidi
HasanAlyazidi

Reputation: 619

sum another column's values where the another column is distinct

Hi StackOverFlow members

reports = A table name.

Database

CREATE TABLE `reports` (
  `id` int(11) NOT NULL auto_increment,
  `report_day_name` varchar(20) NOT NULL,
  `report_day` varchar(20) NOT NULL,
  `report_month` varchar(20) NOT NULL,
  `report_year` varchar(20) NOT NULL,
  `report_result_number` varchar(20) NOT NULL,
  `report_result_text` varchar(20) NOT NULL,
  `report_since` varchar(20) NOT NULL,
  `report_date` varchar(20) NOT NULL,
  `catid` int(11) NOT NULL,
  `subjectid` int(11) NOT NULL,
  `userid` int(11) NOT NULL,
  `groupid` int(11) NOT NULL,
  PRIMARY KEY  (`id`)
) ENGINE=MyISAM  DEFAULT CHARSET=utf8 AUTO_INCREMENT=78 ;

INSERT INTO `reports` VALUES (73, 'day', '14', '1', '1434 h', '5', 'rate', '1234567890', '1434-1-14', 1, 132, 33, 35);
INSERT INTO `reports` VALUES (74, 'day', '12', '2', '1435 h', '4', 'rate', '1234567890', '1434-2-12', 2, 136, 36, 35);
INSERT INTO `reports` VALUES (75, 'day', '14', '1', '1434 h', '2', 'rate', '1354488730', '1434-1-14', 1, 132, 33, 35);
INSERT INTO `reports` VALUES (76, 'day', '12', '2', '1435 h', '4', 'rate', '1354488730', '1434-2-12', 2, 137, 36, 35);
INSERT INTO `reports` VALUES (77, 'day', '12', '2', '1435 h', '1', 'rate', '1354488730', '1434-2-12', 2, 134, 33, 35);

This is the database table:

id  report_result_number    subjectid   userid
73  5                       132         33
74  4                       136         36
75  2                       132         33
76  4                       137         36
77  1                       134         33

I want to SUM(reports.report_result_number) where (reports.subjectid) is DISTINCT

when i run this code..

SELECT
  users.user_id, users.user_name, users.user_country, SUM(reports.report_result_number) AS AllTotal, COUNT(DISTINCT reports.subjectid) AS TotalSubjects
FROM
  users
  INNER JOIN reports ON users.user_id = reports.userid
GROUP BY
  users.user_id
  ORDER BY
  AllTotal DESC LIMIT 4

it returns AllTotal

user_id user_name   user_country    AllTotal    TotalSubjects
36       name         country        8 (correct)        2
33        name        country        8 (not correct)    2

Upvotes: 3

Views: 2134

Answers (1)

spencer7593
spencer7593

Reputation: 108370

The question is open to a couple of interpretations.

If what you want is the value of result_report_number to be included in the SUM aggregate ONLY if there is only ONE row for a given subjectid and userid, (if there is more than one row for the same subjectid, you want to EXCLUDE the report_result_number for all of those rows...

Then something like this will work:

SELECT u.user_id
     , u.user_name
     , u.user_country
     , SUM(s.report_result_number) AS AllTotal
     , COUNT(DISTINCT r.subjectid) AS TotalSubjects
  FROM users u
  JOIN reports r
    ON r.userid = u.user_id
  JOIN ( SELECT d.userid
              , d.subjectid
              , d.report_result_number
           FROM reports d
          GROUP
             BY d.userid
              , d.subjectid
         HAVING COUNT(1) = 1
       ) s
    ON s.userid = r.userid
 GROUP
    BY u.user_id
 ORDER
    BY AllTotal DESC
 LIMIT 4


That's only one (odd) interpretation of the requested result set. Sample data and an expected result set would go a long ways towards clarifying the specification.


For the data you added to your question, this query should be returning, e.g.

36 fee fi   8  2 
33 foo bar  1  2 

There are two rows with a subjectid value of 132 for user 33, so the report_result_number for those rows is excluded from the SUM. There are two distinct values for subjectid (132 and 134), so we're returning a :distinct: count of 2.


If you are asking for the SUM to return a value ONLY if there are no duplicate values for subjectid for a given user...

SELECT u.user_id
     , u.user_name
     , u.user_country
     , IF(COUNT(DISTINCT r.subjectid) = COUNT(r.subjectid)
         ,SUM(r.report_result_number)
         ,NULL
       ) AS AllTotal
     , COUNT(DISTINCT r.subjectid) AS TotalSubjects
  FROM users u
  JOIN reports r
    ON r.userid = u.user_id
 GROUP
    BY u.user_id
 ORDER
    BY AllTotal DESC
 LIMIT 4

Hasan said... "If there are duplicated values [of subjectid for a given userid], get one of them"

Just remove the HAVING clause from the inline view aliased as s. That will return the value of report_result_number for one row. (It will be arbitrary as to which "matching" row the value will be returned from:

SELECT u.user_id
     , u.user_name
     , u.user_country
     , SUM(r.report_result_number) AS AllTotal
     , COUNT(DISTINCT r.subjectid) AS TotalSubjects
  FROM users u
  JOIN ( SELECT d.userid
              , d.subjectid
              , d.report_result_number
           FROM reports d
          GROUP
             BY d.userid
              , d.subjectid
       ) r
    ON r.userid = u.user_id
 GROUP
    BY u.user_id
 ORDER
    BY AllTotal DESC
 LIMIT 4

To make the resultset repeatable, to always get the lowest or highest value, you could add an aggregate function to specify which value to return.

replace...

          , d.report_result_number

with...

          , MAX(d.report_result_number)  AS report_result_number

With the MAX() aggregate, this will return:

36 fee fi   8  2
33 foo bar  6  2

(The query will get the value of '5' for subjectid=132 userid=33, and will omit the value of '2' for the same subjectid.) Absent the MAX aggregate, the query could validly (and arbitrarily) return a '3' in place of the '6'. (It can include either the '5' or the '2', and omit the other.)

SQL Fiddle here

SQL Fiddle including the MAX aggregate here


Q: how can i use (where report_month = 'number') in your code?

A: add the WHERE clause in the inline view, after the FROM clause before the GROUP BY clause. Replace this:

       FROM reports d
      GROUP

with e.g.

       FROM reports d
      WHERE d.report_month = 'number'
      GROUP

Only rows that satisfy the specified predicate will be returned.

Upvotes: 2

Related Questions