Reputation: 411
If I have a table of, say, blog posts, with columns such as post_id and author_id, and I used the SQL "SELECT * FROM post_table where author_id = 34", what would be the computational complexity of that query? Would it simply look through each row and check if it has the correct author id, O(n), or does it do something more efficient?
I was just wondering because I'm in a situation where I could either search an SQL database with this data, or load an xml file with a list of posts, and search through those, I was wondering which would be faster.
Upvotes: 7
Views: 5089
Reputation: 1270483
There are two basic ways that such a simple query would be executed.
The first is to do a full table scan. This would have O(n) performance.
The second is a to look up the value in an index, then load the page, and return the results. The index scan should be O(log(n)). Loading the page should be O(1).
With a more complicated query, it would be hard to make such a general statement. But any SQL engine is generally going to take one of these two paths. Oh, there is a third option if the table is partitioned on author_id, but you are probably not interested in that.
That said, the power of a database is not in these details. It is in the management of memory. The database will cache the data and index in memory, so you do not have to re-read data pages. The database will take advantage of multiple processors and multiple disks, so you do not have to code this. The database keeps everything consistent, in the face of updates and deletes.
As for your specific question. If the data is in the database, search it there. Loading all the data into an xml file and then doing the search in memory requires a lot of overhead. You would only want to do that if the connection to your database is slow and you are doing many such queries.
Upvotes: 9
Reputation: 13354
Have a look at the EXPLAIN command. It shows you what the database actually does when executing a given SELECT query.
Upvotes: 6