ygesher
ygesher

Reputation: 1161

What's faster, db calls or resorting an array?

In a site I maintain I have a need to query the same table (articles) twice, once for each category of article. AFAIT there are basically two ways of doing this (maybe someone can suggest a better, third way?):

  1. Perform the db query twice, meaning the db server has to sort through the entire table twice. After each query, I iterate over the cursor to generate html for a list entry on the page.
  2. Perform the query just once and pull out all the records, then sort them into two separate arrays. After this, I have to iterate over each array separately in order to generate the HTML.

So it's this:

$newsQuery = $mysqli->query("SELECT * FROM articles WHERE type='news' ");
while($newRow = $newsQuery->fetch_assoc()){
  // generate article summary in html
}
// repeat for informational articles

vs this:

$query = $mysqli->query("SELECT * FROM articles ");
$news = Array();
$info = Array();
while($row = $query->fetch_assoc()){
  if($row['type'] == "news"){
    $news[] = $row;
  }else{
    $info[] = $row;
  }
}
// iterate over each array separate to generate article summaries

The recordset is not very large, current <200 and will probably grow to 1000-2000. Is there a significant different in the times between the two approaches, and if so, which one is faster?

(I know this whole thing seems awfully inefficient, but it's a poorly coded site I inherited and have to take care of without a budget for refactoring the whole thing...)

I'm writing in PHP, no framework :( , on a MySql db.

Edit

I just realized I left out one major detail. On a given page in the site, we will display (and thus retrieve from the db) no more than 30 records at once - but here's the catch: 15 info articles, and 15 news articles. On each page we pull the next 15 of each kind.

Upvotes: 0

Views: 834

Answers (3)

spencer7593
spencer7593

Reputation: 108510

EDIT

Due to the change made to the question, I'm updating my answer to address the newly revealed requirement: 15 rows for 'news' and 15 rows for not-'news'.

The gist of the question is the same "which is faster... one query to two separate queries". The gist of the answer remains the same: each database roundtrip incurs overhead (extra time, especially over a network connection to a separate database server), so with all else being equal, reducing the number database roundtrips can improve performance.

The new requirement really doesn't impact that. What the newly revealed requirement really impacts is the actual query to return the specified resultset.

For example:

( SELECT n.*
    FROM articles n 
   WHERE n.type='news'
   LIMIT 15 
)
UNION ALL
( SELECT o.*
    FROM articles o 
   WHERE NOT (o.type<=>'news')
   LIMIT 15 
)

Running that statement as a single query is going to require fewer database resources, and be faster than running two separate statements, and retrieving two disparate resultsets.

We weren't provided any indication of what the other values for type can be, so the statement offered here simply addresses two general categories of rows: rows that have type='news', and all other rows that have some other value for type.

That query assumes that type allows for NULL values, and we want to return rows that have a NULL for type. If that's not the case, we can adjust the predicate to be just

 WHERE o.type <> 'news'

Or, if there are specific values for type we're interested in, we can specify that in the predicate instead

 WHERE o.type IN ('alert','info','weather')

If "paging" is a requirement... "next 15", the typical pattern we see applied, LIMIT 30,15 can be inefficient. But this question isn't asking about improving efficiency of "paging" queries, it's asking whether running a single statement or running two separate statements is faster.

And the answer to that question is still the same.

ORIGINAL ANSWER below


There's overhead for every database roundtrip. In terms of database performance, for small sets (like you describe) you're better off with a single database query.

The downside is that you're fetching all of those rows and materializing an array. (But, that looks like that's the approach you're using in either case.)

Given the choice between the two options you've shown, go with the single query. That's going to be faster.


As far as a different approach, it really depends on what you are doing with those arrays.

You could actually have the database return the rows in a specified sequence, using an ORDER BY clause.

To get all of the 'news' rows first, followed by everything that isn't 'news', you could

ORDER BY type<=>'news' DESC

That's MySQL short hand for the more ANSI standards compliant:

ORDER BY CASE WHEN t.type = 'news' THEN 1 ELSE 0 END DESC

Rather than fetch every single row and store it in an array, you could just fetch from the cursor as you output each row, e.g.

while($row = $query->fetch_assoc()) {
   echo "<br>Title: " . htmlspecialchars($row['title']);
   echo "<br>byline: " . htmlspecialchars($row['byline']);
   echo "<hr>";
}

Upvotes: 1

walther
walther

Reputation: 13598

Best way of dealing with a situation like this is to test this for yourself. Doesn't matter how many records do you have at the moment. You can simulate whatever amount you'd like, that's never a problem. Also, 1000-2000 is really a small set of data.

I somewhat don't understand why you'd have to iterate over all the records twice. You should never retrieve all the records in a query either way, but only a small subset you need to be working with. In a typical site where you manage articles it's usually about 10 records per page MAX. No user will ever go through 2000 articles in a way you'd have to pull all the records at once. Utilize paging and smart querying.

// iterate over each array separate to generate article summaries

Not really what you mean by this, but something tells me this data should be stored in the database as well. I really hope you're not generating article excerpts on the fly for every page hit.

It all sounds to me more like a bad architecture design than anything else...

PS: I believe sorting/ordering/filtering of a database data should be done on the database server, not in the application itself. You may save some traffic by doing a single query, but it won't help much if you transfer too much data at once, that you won't be using anyway.

Upvotes: 0

Arth
Arth

Reputation: 13110

You know you can sort in the DB right?

SELECT * FROM articles ORDER BY type

Upvotes: 1

Related Questions