Not selecting duplicates in join / where query

Question

I've been trying to learn MySQL, and I'm having some trouble creating a join query to not select duplicates.

Basically, here's where I'm at :

SELECT atable.phonenumber, btable.date
FROM btable
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4

However, in my database, there is the possibility of having duplicate rows in column atable.phonenumber.

For example (added asterisks for clarity)

  phonenumber |   date
 -------------|-----------
*555-681-2105 | 2015-08-12
 555-425-5161 | 2015-08-15
 331-484-7784 | 2015-08-17
*555-681-2105 | 2015-08-25

.. and so on.

I tried using SELECT DISTINCT but that doesn't work. I also was looking through other solutions which recommended GROUP BY, but that threw an error, most likely because of my WHERE clause and condition. Not really sure how I can easily accomplish this.

Caius Jard · Accepted Answer

DISTINCT applies to the whole row being returned, essentially saying "I want only unique rows" - any row value may participate in making the row unique

You are getting phone numbers duplicated because you're only looking at the column in isolation. The database is looking at phone number and also date. The rows you posted have different dates, and these hence cause the rows to be different

I suggest you do as the commenter recommended and decide what you want to do with the dates. If you want the latest date for a phone number, do this:

SELECT atable.phonenumber, max(btable.date)
FROM battle
LEFT JOIN atable ON btable.id = atable.id
WHERE btable.country_id = 4
GROUP BY atable.phonenumber

When you write a query that uses grouping, you will get a set of rows where there is only one set of value combinations for anything that is in the group by list. In this case, only unique phone numbers. But, because you want other values as well (I.e. Date) you MUST use what's called an aggregate function, to specify what you want to do with all the various values that aren't part of the unique set. Sometimes it will be MAX or MIN, sometimes it will be SUM, COUNT, AVG and so on.

if you're familiar with hash tables or dictionaries from elsewhere in programming, this is what a group by is: it maps a set of values (a key) to a list of rows that have those key values, and then the aggregating function is applied to any of the values in the list associated with the key

The simple rule when using group by (and one that MySQL will do implicitly for you) is to write queries thus:

SELECT 
  List,
  of,
  columns,
  you,
  want,
  in,
  unique,
  combination,
  FN(List),
  FN(of),
  FN(columns),
  FN(you),
  FN(want),
  FN(aggregating) 
FROM table
GROUP BY
  List,
  of,
  columns,
  you,
  want,
  in,
  unique,
  combination

i.e. You can copy paste from your select list to your group list. MySQL does this implicitly for you if you don't do it (i.e. If you use one or more aggregate functions like max in your select list, but forget or omit the group by clause- it will take everything that isn't in an agggregate function and run the grouping as if you'd written it). Whether group by is hence largely redundant is often debated, but there do exist other things you can do with a group by, such as rollup, cube and grouping sets. Also you can group on a column, if that column is used in a deterministic function, without having to group on the result of he deterministic function. Whether there is any point to doing so is a debate for another time :)

Not selecting duplicates in join / where query

Answers (2)

Related Questions