Reputation: 65
Say I have two tables:
table A with 6000 records (i.e. T(A) = 6000)
table B with 400 000 records (i.e. T(B) = 400000)
for some reason, I decided that for my final query I would need to join A with B twice but I decided to do this (presumably) very ineffectively via the cartesian product. So I would do A * B * B, i.e. T(A) * T(B) * T(B) = which is all of a sudden a quadrillion of records being processed internally (only to be striped to dozens via selection and projection for example).
While maybe ineffective, would an average server handle this? If so, is there any limit, even theoretically? What if the tables were magnitudes bigger?
Upvotes: 0
Views: 284
Reputation: 1269753
You are confusing the logic model of processing with what actually happens inside the database.
Projection and selection and Cartesian products are concepts from relational algebra. That explains what SQL does. It does not explain how databases do that.
In particular, databases have lots of algorithms that support joining and aggregating tables. Databases also have auxiliary data structures -- in particular, indexes and partitions -- that allow further optimization.
If you have no join
conditions or filtering or aggregation, then the database does need to generate the Cartesian product -- and that can be quite expensive.
In general, though, databases do not generate the Cartesian product. If they did, databases would not be very useful.
Is there a limit on the size of data or processing. Practical limits are more common than hard limits in the databases themselves. In general, available memory and disk space limit the size of the data that can be processed -- but the limit is typically much, much higher than your example.
Upvotes: 1
Reputation: 29629
Your question is hypothetical, and may lead to opinion-based answers, but I'll give it a shot.
You say that from your cartesian product, you intend to return only a dozen or so records. If those records can be found through indexes, an "average" server should be absolutely fine - it doesn't matter how many entries in the phone book, as long as you're search is by last name, your search is fast. If you're searching 2 phone books for 2 last names, still fine.
If the 12 rows you need have to be found simple comparison of non-indexed fields, it's probably fine - the largest table is only 400K rows, and that should be pretty fast. If you're searching for street name in the phonebook, the size of the phonebook matters, but modern hardware should be ok. Better to put an index on the column though.
If you have to find the 12 rows by doing some kind of calculation fields, it will likely be a problem. If you have to convert all the last names in the phonebook to an integer and multiplying it by the date of the month to find the 12 rows you seek, the server has to do a quadrillion calculations, and that is likely going to be slow.
Upvotes: 1