element119
element119

Reputation: 7625

SQL query returns abnormally large count numbers

My schema roughly translates to the following:

reservation: reservation_id, inventory_id
inventory: inventory_id, car_id
car: car_id
feature: feature_id, name
car_feature: car_id, feature_id

The design is relatively simple. A few unnecessary columns are omitted, but the basic idea is that reservations are made for inventory items. Inventory items can be cars, which have car_ids. A car can have many features (and features can apply to several cars).

What I want to do is find the number of reservations made for each type of feature. Example: a table with one column being the feature "name", and another column being the number of reservations made for cars with that feature. For pedantic reasons, I do not want to use any nested queries.

Here is the query that I attempted:

select f.name, count(*) as reservation_count
from reservation r, inventory i, car c, car_feature cf, feature f
where r.inventory_id = i.inventory_id and i.car_id = cf.car_id and cf.feature_id = f.feature_id
group by f.name;

The query correctly returns each named feature in the following result tableset:

-name-    -reservation_count-
spokes     945000
wheels     931000
sunroof    104000
...        ...

However, it takes an abnormally large amount of time to compute (11 seconds for a dataset with a few thousand rows), and the reservation counts are way too high- there are only 16k rentals in the database.

What am I doing wrong?

Upvotes: 0

Views: 1806

Answers (3)

Nicholas Carey
Nicholas Carey

Reputation: 74277

A big chunk of your problem is that you are using old-skool, pre-ANSI/ISO style joins. Which makes it hard to see what you're doing.

Lets take your original query and format it bit, so we can more easily see what's going on:

select f.name   ,
       count(*) as reservation_count
from reservation r  ,
     inventory   i  ,
     car         c  ,
     car_feature cf ,
     feature     f
where r.inventory_id = i.inventory_id
  and i.car_id       = cf.car_id
  and cf.feature_id  = f.feature_id
group by f.name
;

And now, let's translate that into ANSI join syntax:

select f.name   ,
       count(*) as reservation_count
from reservation r
join inventory   i  on i.inventory_id = r.inventory_id
join car         c  on ???
join car_feature cf on cf.car_id      = i.car_id
join feature     f  on f.feature_id   = cf.feature_id
group by f.name

Hmmm...That doesn't look quite right, does it? Somehow, the car table seems to have been omitted from the join criteria.

If we flesh things out in a likely way, using the schema you provided, we get something like this:

select f.name   ,
       count(*) as reservation_count
from reservation r
join inventory   i  on i.inventory_id = r.inventory_id
join car         c  on c.car_id       = i.care_id
join car_feature cf on cf.car_id      = c.car_id
join feature     f  on f.feature_id   = cf.feature_id
group by f.name

This will likely give you the results you want.

You might notice that's it's also a whole lot easier to read and understand.

As an added bonus, sorting out problems is a lot easier to. Change the the statement to select count(*), comment out all the join clauses and the where clause (if any), thus

select --f.name   ,
       --count(*) as reservation_count
count(*)
from reservation r
--join inventory   i  on i.inventory_id = r.inventory_id
--join car         c  on c.car_id       = i.care_id
--join car_feature cf on cf.car_id      = c.car_id
--join feature     f  on f.feature_id   = cf.feature_id
--group by f.name

Running that should give you the raw number reservations.

Adding in the join to the inventory table

select --f.name   ,
       --count(*) as reservation_count
count(*)
from reservation r
join inventory   i  on i.inventory_id = r.inventory_id
--join car         c  on c.car_id       = i.care_id
--join car_feature cf on cf.car_id      = c.car_id
--join feature     f  on f.feature_id   = cf.feature_id
--group by f.name

should give you, I would imagine, the same reservation count. If the number is different from what you expect, there's a good change that your join criteria isn't correct or that you don't prperly understand the cardinality of the relationship involved.

Repeat until you've added all the joined tables back in. Then start uncommenting the where clause one bit at a time, verifying that your numbers are what you expect.

Once you're happy that your query is good, put the columns in the result list back and uncomment the group by clause.

Upvotes: 2

user1269942
user1269942

Reputation: 3852

Try taking "car c," out of your query. This will take out some redundant rows because you do not join to it. Having it included the way you do will just make n-car copies of each feature.

And for speed, try indexing the columns you join on. (see the DESCRIBE comment)

Upvotes: 2

geoandri
geoandri

Reputation: 2428

I think the following query will return the result you need

 Select feature.name, COUNT(reservation.reservation_id) as reservation_count
 from feature
 inner join car_feature on feature.feature_id=car_feature.feature_id
 inner join car  on car_feature.car_id=car.car_id
 inner join inventory on  car.car_id=inventory.car_id
 inner join reservation on inventory.inventory_id = reservation.inventory_id
 GROUP BY feature.name

Upvotes: 2

Related Questions