Brandon Longley
Brandon Longley

Reputation: 13

How to remove null values from a count function

I'm having trouble with my results of my count function, i think its due to the count(a1_journal.publisher_id) >=3 however instead of removing the results that are <3 from display it just gives them a null value.

Is there a way to either remove the null values from display (i tried a where col IS NOT NULL but it didn't work properly) or stop them from being selected in the count?

SELECT publisher_name, (select count(a1_journal.publisher_id)
FROM a1_journal
WHERE a1_journal.publisher_id=a1_publisher.publisher_id 
HAVING count(a1_journal.publisher_id) >=3)
AS Number_of_Journals
FROM a1_publisher
ORDER by Number_of_Journals DESC;

Thanks

Upvotes: 1

Views: 2883

Answers (2)

hgulyan
hgulyan

Reputation: 8239

The problem is that you return all rows in table a1_publisher. Try this instead.

select j.publisher_id, count(j.publisher_id)
FROM a1_journal j inner join a1_publisher p ON  j.publisher_id=p.publisher_id 
GROUP BY j.publisher_id
HAVING count(j.publisher_id) >=3
ORDER BY count(j.publisher_id) DESC

UPDATE:

To select publisher's name there're 2 ways.

  1. If publisher's name is unique you can add the column to group by like this

    select j.publisher_id,p.publisher_name, count(j.publisher_id)
    FROM a1_journal j 
      inner join a1_publisher p ON  j.publisher_id=p.publisher_id 
    GROUP BY j.publisher_id, p.publisher_name
    HAVING count(j.publisher_id) >=3
    ORDER BY count(j.publisher_id) DESC
    
  2. If it's not unique, you should have another join with a1_publisher like this.

    SELECT aj.publisher_id, aj.numberOfJournals, ap.publisher_name
    FROM a1_publisher ap 
    INNER JOIN (
        SELECT j.publisher_id, count(j.publisher_id) numberOfJournals
        FROM a1_journal j 
           inner join a1_publisher p ON  j.publisher_id=p.publisher_id 
        GROUP BY j.publisher_id
        HAVING count(j.publisher_id) >=3  ) aj 
    ON ap.publisher_id = ap.publisher_id
    ORDER BY count(j.publisher_id) DESC
    

Upvotes: 3

cairnz
cairnz

Reputation: 3957

Your query, in the way you've made it doesn't care about the COUNT inside the subselect. It'll be processed for every publisher_name no matter what. It is also inefficient in terms of how it's processing. Performance if the data sets are very large would most likely be very noticeable.

You have several ways to fix this. One way is to rewrite to a join and use the COUNT() function to aggregate the number of journals, like this:

SELECT
    a1_publisher.publisher_name,
    COUNT(a1_journal.publisher_id) AS Number_of_Journals
FROM
    a1_publisher
INNER JOIN a1_publisher
    ON a1_journal.publisher_id = a1_publisher.publisher_id
GROUP BY
    a1_publisher.publisher_name
HAVING
    COUNT(a1_journal.publisher_id) >= 3 -- Here is your filter.
ORDER BY
    COUNT(a1_journal.publisher_id) DESC, a1_publisher.publisher_name ;

Now, if you want to find the publishers without any journals, you can change it slightly:

SELECT
    a1_publisher.publisher_name
FROM
    a1_publisher
LEFT  JOIN a1_publisher
    ON a1_journal.publisher_id = a1_publisher.publisher_id
WHERE
    a1_journal.pubhslier_id IS NULL
ORDER BY
    a1_publisher.publisher_name

For performance reasons, have an index on a1_journal.publisher_id, and one on a1_publisher.publisher_id (though i suspect this is already the primary key)

Upvotes: 0

Related Questions