Reputation: 63
I'm with a doubt about joins. For example, using an example database dvdrental, this query:
SELECT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
Some records appear repeated, for example, it appears 3 times "342 Harold Martino" like:
342 Harold Martino
342 Harold Martino
342 Harold Martino
Do you know why it appears repeated records like in this example that appears the same Record 3 times? This repetition means that there are 3 records in the payment table where customer_id = 342? But this query "select * from payment where customer_id = 342" returns 32 records. So I'm not understanding properly how the join works.
Upvotes: 1
Views: 1190
Reputation: 16669
There are many resources around this, so to be short your expression says this in plain english:
Get all the records from the customer table
Then for each of those records, get every payment record that has the same value in the customer_Id
field.
return a single row for each payment record that duplicates all the fields from the customer record for each row in the payment record.
Finally, only return 3 columns:
customer_id
column from the customer
tablefirst_name
column that is in one of the customer
or payment
tablelast_name
column that is in one of the customer
or payment
tableNote that we didn't bring back any columns from the
payment
table... (I assume first_name and last_name are fields in thecustomer
table...)
Keep in mind, a CROSS JOIN
(or a FULL OUTER JOIN
) is a join that says take all fields from the left side and create a single row combination that is multiplied by the right side, so for every row on the left, return a combination of the left row with every row on the right. So the number of rows returned in a CROSS JOIN
is the number of rows in the current table, multiplied by the number of rows in the joined table.
In your query, an INNER JOIN
or LEFT INNER JOIN
will produce a recordset that includes all the fields from the current table structure and will include fields from the joined table as well.
the implicit LEFT
component specifies that only records that match the existing table structure should be returned, so in this case only Payment
records that match a customer_id
in the currently not filtered customer
table will be returned.
The number of resulting rows is the number of rows in the joined table that have a match in the current table.
If instead you want to query:
Select all the customers that have payments
then you can use a join, but you should also use a DISTINCT
clause, to only return the unique records:
SELECT DISTINCT customer.customer_id
, first_name
, last_name
FROM customer
INNER JOIN payment ON Customer.customer_id = Payment.customer_id
An alternative way to do this is to use a sub-query instead of a join:
SELECT customer_id
, first_name
, last_name
FROM customer
WHERE EXISTS (SELECT customer_id FROM payment WHERE payment.customer_id = customer.customer_id)
The rules on when to use one style of query over the other are pretty convoluted and very dependant on the number of rows in each table, the types of indexes that are available and even the type or version of the RDBMS you are operating within.
To optimise, run both queries, compare the results and timing and use the one that fits your database better. If later performance becomes an issue, try the other one :)
Select the Customer_id
field
Upvotes: 2