Olivier
Olivier

Reputation: 321

Advanced SELECT query (HAVING)

I have the following theoretical database:

Hotel(Hotel_No, Name, Address)
Room(Room_No, Hotel_No, Type, Price)
Booking(Hotel_No, Guest_No, Date_From, Date_To, Room_No)
Guest(Guest_No, Name, Address)

Bold attributes serve as the primary keys for that table.

The goal is to find the most commonly booked room type for each hotel in London. The expected end output in my head is the following, where Hotel_No is the id of a unique hotel in London, and Type is the type of room a hotel offers:

+----------+----------+-------+
| Hotel_No |   Type   | Count |
+----------+----------+-------+
|       67 | Single   |    53 |
|       45 | Double   |    67 |
|       32 | Double   |    23 |
|      ... | ...      |   ... |
+----------+----------+-------+

Bookings are counted as the reservations within the bookings table, with each row representing one booking.

The answer I was given is the following:

SELECT Type, H.hotel No
FROM Room R, Hotel H, Booking B
WHERE R.Hotel_No = H.Hotel No 
AND H.Address LIKE '%London%'
AND B.Room No = R.Room_No 
AND H.Hotel_No = B.Hotel_No 
GROUP BY Type, H.hotel_No
HAVING Count(*) >= ALL(
    SELECT Count(*)
    FROM Booking B2,Room R2
    WHERE B2.Hotel_No = H.Hotel_No 
    AND R2.Room_No = B2.Room_No 
    GROUP BY R2.Type);

Is this solution correct?

I am having trouble understanding how the nested query is managing to select the highest Type of hotel booking by hotel in London, rather than simply returning the row or rows with the highest booking by type, aggregated over all hotels in London.

Why do we create a new table in the nested query, B2, and R2, but still reference the original H table?

Sample data:

CREATE TABLE Hotel (
  Hotel_No INT NOT NULL AUTO_INCREMENT,
  Hotel VARCHAR(10),
  Address VARCHAR(10),
  PRIMARY KEY (Hotel_No)
);

CREATE TABLE Room (
  Room_No INT NOT NULL,
  Hotel_No INT NOT NULL,
  Type ENUM('Single', 'Double', 'Family'),
  PRIMARY KEY(Room_No, Hotel_No),
  FOREIGN KEY(Hotel_No) REFERENCES Hotel(Hotel_No)
);

CREATE TABLE Booking (
  Hotel_No INT NOT NULL,
  Guest_No INT NOT NULL,
  Date_From DATE NOT NULL,
  Room_No INT NOT NULL,
  PRIMARY KEY (Hotel_No, Guest_No, Date_From),
  FOREIGN KEY (Hotel_No) REFERENCES Hotel(Hotel_No),
  FOREIGN KEY (Room_No, Hotel_No) REFERENCES Room(Room_No, Hotel_No)
);

INSERT INTO Hotel (Name, Address) VALUES 
('Hotel A', 'London'),
('Hotel B', 'London'),
('Hotel C', 'London'),
('Hotel D', 'Birmingham');

INSERT INTO Room VALUES
(1, 1, 'Single'),
(2, 1, 'Double'),
(1, 2, 'Single'),
(2, 2, 'Double'),
(3, 2, 'Family'),
(1, 3, 'Single'),
(2, 3, 'Double'),
(1, 4, 'Single'),
(2, 4, 'Single');

  INSERT INTO Booking VALUES
  (1, 1, '2000/01/01', 1),
  (1, 2, '2000/02/01', 1),
  (1, 3, '2000/01/01', 2),
  (1, 4, '2000/02/01', 2),
  (1, 5, '2000/03/01', 2),
  (2, 6, '2000/01/01', 1),
  (2, 7, '2000/01/01', 2),
  (2, 8, '2000/01/01', 3),
  (2, 9, '2000/02/01', 3),
  (2, 11, '2000/01/01', 1),
  (3, 12, '2000/01/01', 2),
  (3, 13, '2000/02/01', 2);

Upvotes: 2

Views: 519

Answers (1)

M Khalid Junaid
M Khalid Junaid

Reputation: 64476

The solution looks fine, In main query your tables are joined and grouped by hotel and type i believe type comes from rooms table, Based on this grouping you can get the count of bookings per hotel and room type which may give you more than one rows for each hotel depends on the booking of room types that belongs to hotel.

The sub query part is just to pick the rows from above query which have the highest count for each hotel, So if hotel A has following count

A single 10
A double 15

then second row will be returned for hotel A because it has highest count. Same logic goes for each hotel.

Edit for sample data

I have tweaked your posted query to get the hotels and room types with maximum count which is as below.

SELECT TYPE, H.hotel_No,COUNT(*) 
FROM Room R, Hotel H, Booking B
WHERE R.Hotel_No = H.Hotel_No 
AND H.Address LIKE '%London%'
AND B.Room_No = R.Room_No 
AND H.Hotel_No = B.Hotel_No 
GROUP BY TYPE, H.hotel_No
HAVING COUNT(*) >=  (
    SELECT COUNT(*)
    FROM Booking B2,Room R2
    WHERE B2.Hotel_No = H.Hotel_No 
    AND R2.Room_No = B2.Room_No
    AND R2.Hotel_No = H.Hotel_No 
    GROUP BY  R2.Type
    ORDER BY COUNT(*) DESC 
    LIMIT 1)
ORDER BY H.hotel_No

the original correlated sub query was missing with another filter to match which was R2.Hotel_No = H.Hotel_No then an ORDER BY COUNT(*) DESC and LIMIT 1 to get the highest count for each hotel, If is isn't used you will have no data because count of booking excluding room types will be higher as compare to count of bookings per hotel and room type.

So back to question here is what sub query is doing. For each row of parent query the result of subquery can be seen here

TYPE    hotel_No  COUNT(*)  sub_query  
------  --------  --------  -----------
Single         1         2            3
Double         1         3            3
Single         2         2            2
Double         2         1            2
Family         2         2            2
Double         3         2            2

In above as you can see For hotel 1 there are 2 rows for room type single booking count is 2 and for room type double booking count is 3 while if you see the result of sub_query in next column the count 3 for both rows. So if you compare the count and result of sub query one by one for hotel 1 first row will be excluded because single booking count is less from the max count. Same logic goes for other rows.

Final result set

TYPE    hotel_No    COUNT(*)

1   Double  1   3
2   Single  2   2
3   Family  2   2
4   Double  3   2

In final result set you can see there for hotel id 2 there are 2 records for different room type because they both share same highest count

DEMO

Upvotes: 1

Related Questions