Reputation: 321
I have the following theoretical database:
Hotel(Hotel_No, Name, Address)
Room(Room_No, Hotel_No, Type, Price)
Booking(Hotel_No, Guest_No, Date_From, Date_To, Room_No)
Guest(Guest_No, Name, Address)
Bold attributes serve as the primary keys for that table.
The goal is to find the most commonly booked room type for each hotel in London. The expected end output in my head is the following, where Hotel_No
is the id of a unique hotel in London, and Type
is the type of room a hotel offers:
+----------+----------+-------+
| Hotel_No | Type | Count |
+----------+----------+-------+
| 67 | Single | 53 |
| 45 | Double | 67 |
| 32 | Double | 23 |
| ... | ... | ... |
+----------+----------+-------+
Bookings are counted as the reservations within the bookings table, with each row representing one booking.
The answer I was given is the following:
SELECT Type, H.hotel No
FROM Room R, Hotel H, Booking B
WHERE R.Hotel_No = H.Hotel No
AND H.Address LIKE '%London%'
AND B.Room No = R.Room_No
AND H.Hotel_No = B.Hotel_No
GROUP BY Type, H.hotel_No
HAVING Count(*) >= ALL(
SELECT Count(*)
FROM Booking B2,Room R2
WHERE B2.Hotel_No = H.Hotel_No
AND R2.Room_No = B2.Room_No
GROUP BY R2.Type);
Is this solution correct?
I am having trouble understanding how the nested query is managing to select the highest Type of hotel booking by hotel in London, rather than simply returning the row or rows with the highest booking by type, aggregated over all hotels in London.
Why do we create a new table in the nested query, B2, and R2, but still reference the original H table?
Sample data:
CREATE TABLE Hotel (
Hotel_No INT NOT NULL AUTO_INCREMENT,
Hotel VARCHAR(10),
Address VARCHAR(10),
PRIMARY KEY (Hotel_No)
);
CREATE TABLE Room (
Room_No INT NOT NULL,
Hotel_No INT NOT NULL,
Type ENUM('Single', 'Double', 'Family'),
PRIMARY KEY(Room_No, Hotel_No),
FOREIGN KEY(Hotel_No) REFERENCES Hotel(Hotel_No)
);
CREATE TABLE Booking (
Hotel_No INT NOT NULL,
Guest_No INT NOT NULL,
Date_From DATE NOT NULL,
Room_No INT NOT NULL,
PRIMARY KEY (Hotel_No, Guest_No, Date_From),
FOREIGN KEY (Hotel_No) REFERENCES Hotel(Hotel_No),
FOREIGN KEY (Room_No, Hotel_No) REFERENCES Room(Room_No, Hotel_No)
);
INSERT INTO Hotel (Name, Address) VALUES
('Hotel A', 'London'),
('Hotel B', 'London'),
('Hotel C', 'London'),
('Hotel D', 'Birmingham');
INSERT INTO Room VALUES
(1, 1, 'Single'),
(2, 1, 'Double'),
(1, 2, 'Single'),
(2, 2, 'Double'),
(3, 2, 'Family'),
(1, 3, 'Single'),
(2, 3, 'Double'),
(1, 4, 'Single'),
(2, 4, 'Single');
INSERT INTO Booking VALUES
(1, 1, '2000/01/01', 1),
(1, 2, '2000/02/01', 1),
(1, 3, '2000/01/01', 2),
(1, 4, '2000/02/01', 2),
(1, 5, '2000/03/01', 2),
(2, 6, '2000/01/01', 1),
(2, 7, '2000/01/01', 2),
(2, 8, '2000/01/01', 3),
(2, 9, '2000/02/01', 3),
(2, 11, '2000/01/01', 1),
(3, 12, '2000/01/01', 2),
(3, 13, '2000/02/01', 2);
Upvotes: 2
Views: 519
Reputation: 64476
The solution looks fine, In main query your tables are joined and grouped by hotel and type i believe type comes from rooms table, Based on this grouping you can get the count of bookings per hotel and room type which may give you more than one rows for each hotel depends on the booking of room types that belongs to hotel.
The sub query part is just to pick the rows from above query which have the highest count for each hotel, So if hotel A has following count
A single 10
A double 15
then second row will be returned for hotel A because it has highest count. Same logic goes for each hotel.
Edit for sample data
I have tweaked your posted query to get the hotels and room types with maximum count which is as below.
SELECT TYPE, H.hotel_No,COUNT(*)
FROM Room R, Hotel H, Booking B
WHERE R.Hotel_No = H.Hotel_No
AND H.Address LIKE '%London%'
AND B.Room_No = R.Room_No
AND H.Hotel_No = B.Hotel_No
GROUP BY TYPE, H.hotel_No
HAVING COUNT(*) >= (
SELECT COUNT(*)
FROM Booking B2,Room R2
WHERE B2.Hotel_No = H.Hotel_No
AND R2.Room_No = B2.Room_No
AND R2.Hotel_No = H.Hotel_No
GROUP BY R2.Type
ORDER BY COUNT(*) DESC
LIMIT 1)
ORDER BY H.hotel_No
the original correlated sub query was missing with another filter to match which was R2.Hotel_No = H.Hotel_No
then an ORDER BY COUNT(*) DESC
and LIMIT 1
to get the highest count for each hotel, If is isn't used you will have no data because count of booking excluding room types will be higher as compare to count of bookings per hotel and room type.
So back to question here is what sub query is doing. For each row of parent query the result of subquery can be seen here
TYPE hotel_No COUNT(*) sub_query
------ -------- -------- -----------
Single 1 2 3
Double 1 3 3
Single 2 2 2
Double 2 1 2
Family 2 2 2
Double 3 2 2
In above as you can see For hotel 1 there are 2 rows for room type single booking count is 2 and for room type double booking count is 3 while if you see the result of sub_query in next column the count 3 for both rows. So if you compare the count and result of sub query one by one for hotel 1 first row will be excluded because single booking count is less from the max count. Same logic goes for other rows.
Final result set
TYPE hotel_No COUNT(*)
1 Double 1 3
2 Single 2 2
3 Family 2 2
4 Double 3 2
In final result set you can see there for hotel id 2 there are 2 records for different room type because they both share same highest count
Upvotes: 1