John
John

Reputation: 63

Why do repeated values appear in SQL results

I'm with a doubt about joins. For example, using an example database dvdrental, this query:

SELECT customer.customer_id
       , first_name
       , last_name
FROM customer 
INNER JOIN payment ON Customer.customer_id = Payment.customer_id

Some records appear repeated, for example, it appears 3 times "342 Harold Martino" like:

342   Harold   Martino

342   Harold   Martino

342   Harold   Martino

Do you know why it appears repeated records like in this example that appears the same Record 3 times? This repetition means that there are 3 records in the payment table where customer_id = 342? But this query "select * from payment where customer_id = 342" returns 32 records. So I'm not understanding properly how the join works.

Upvotes: 1

Views: 1190

Answers (1)

Chris Schaller
Chris Schaller

Reputation: 16669

There are many resources around this, so to be short your expression says this in plain english:

Get all the records from the customer table Then for each of those records, get every payment record that has the same value in the customer_Id field. return a single row for each payment record that duplicates all the fields from the customer record for each row in the payment record. Finally, only return 3 columns:

  • the customer_id column from the customer table
  • the first_name column that is in one of the customer or payment table
  • the last_name column that is in one of the customer or payment table

Note that we didn't bring back any columns from the payment table... (I assume first_name and last_name are fields in the customer table...)

Keep in mind, a CROSS JOIN (or a FULL OUTER JOIN) is a join that says take all fields from the left side and create a single row combination that is multiplied by the right side, so for every row on the left, return a combination of the left row with every row on the right. So the number of rows returned in a CROSS JOIN is the number of rows in the current table, multiplied by the number of rows in the joined table.

In your query, an INNER JOIN or LEFT INNER JOIN will produce a recordset that includes all the fields from the current table structure and will include fields from the joined table as well. the implicit LEFT component specifies that only records that match the existing table structure should be returned, so in this case only Payment records that match a customer_id in the currently not filtered customer table will be returned.

The number of resulting rows is the number of rows in the joined table that have a match in the current table.


If instead you want to query:

Select all the customers that have payments

then you can use a join, but you should also use a DISTINCT clause, to only return the unique records:

SELECT DISTINCT customer.customer_id
       , first_name
       , last_name
FROM customer 
INNER JOIN payment ON Customer.customer_id = Payment.customer_id

An alternative way to do this is to use a sub-query instead of a join:

SELECT customer_id
       , first_name
       , last_name
FROM customer 
WHERE EXISTS (SELECT customer_id FROM payment WHERE payment.customer_id = customer.customer_id)

The rules on when to use one style of query over the other are pretty convoluted and very dependant on the number of rows in each table, the types of indexes that are available and even the type or version of the RDBMS you are operating within.

To optimise, run both queries, compare the results and timing and use the one that fits your database better. If later performance becomes an issue, try the other one :)

Select the Customer_id field

Upvotes: 2

Related Questions