Reputation: 3184
I'm new to pagination, so I'm not sure I fully understand how it works. But here's what I want to do.
Basically, I'm creating a search engine of sorts that generates results from a database (MySQL). These results are merged together algorithmically, and then returned to the user.
My question is this: When the results are merged on the backend, do I need to create a temporary view with the results that is then used by the PHP pagination? Or do I create a table? I don't want a bunch of views and/or tables floating around for each and every query. Also, if I do use temporary tables, when are they destroyed? What if the user hits the "Back" button on his/her browser?
I hope this makes sense. Please ask for clarification if you don't understand. I've provided a little bit more information below.
MORE EXPLANATION: The database contains English words and phrases, each of which is mapped to a concept (Example: "apple" is 0.67 semantically-related to the concept of "cooking"). The user can enter in a bunch of keywords, and find the closest matching concept to each of those keywords. So I am mathematically combining the raw relational scores to find a ranked list of the most semantically-related concepts for the set of words the user enters. So it's not as simple as building a SQL query like "SELECT * FROM words WHERE blah blah..."
Upvotes: 2
Views: 862
Reputation: 36421
It depends on your database engine (i.e. what kind of SQL), but nearly each SQL flavor has support for paginating a query.
For example, MySQL has LIMIT and MS SQL has ROW_NUMBER.
So you build your SQL as usual, and then you just add the database engine-specific pagination stuff and the server automatically returns only, say, row 10 to 20 of the query result.
EDIT:
So the final query (which selects the data that is returned to the user) selects data from some tables (temporary or not), as I expected.
It's a SELECT
query, which you can page with LIMIT
in MySQL.
Your description sounds to me as if the actual calculation is way more resource-hogging than the final query which returns the results to the user.
So I would do the following:
LIMIT
.So you have to do the actual calculation (the resource-hogging queries) only once when the user "starts" the query. Then you can return paginated results to the user by just selecting from the already populated results table.
EDIT 2:
I just saw that you accepted my answer, but still, here's more detail about my usage of "temporary" tables.
Of course this is only one possible way to do it. If the expected result is not too large, returning the whole resultset to the client, keeping it in memory and doing the paging client side (as you suggested) is possible as well.
But if we are talking about real huge amounts of data of which the user will only view a few (think Google search results), and/or low bandwidth, then you only want to transfer as little data as possible to the client.
That's what I was thinking about when I wrote this answer.
So: I don't mean a "real" temporary table, I'm talking about a "normal" table used for saving temporary data.
I'm way more proficient in MS SQL than in MySQL, so I don't know much about temp tables in MySQL.
I can tell you how I would do it in MS SQL, but maybe there's a better way to do this in MySQL that I don't know.
When I'd have to page a resource-intensive query, I want do the actual calculation once, save it in a table and then query that table several times from the client (to avoid doing the calculation again for each page).
The problem is: in MS SQL, a temp table only exists in the scope of the query where it is created.
So I can't use a temp table for that because it would be gone when I want to query it the second time.
So I use "real" tables for things like that.
I'm not sure whether I understood your algorithm example correct, so I'll simplify the example a bit. I hope that I can make my point clear anyway:
This is the table (this is probably not valid MySQL, it's just to show the concept):
create table AlgorithmTempTable
(
QueryID guid,
Rank float,
Value float
)
As I said before - it's not literally a "temporary" table, it's actually a real permanent table that is just used for temporary data.
Now the user opens your application, enters his search words and presses the "Search" button.
Then you start your resource-heavy algorithm to calculate the result once, and store it in the table:
insert into AlgorithmTempTable (QueryID, Rank, Value)
select '12345678-9012-3456789', foo, bar
from Whatever
insert into AlgorithmTempTable (QueryID, Rank, Value)
select '12345678-9012-3456789', foo2, bar2
from SomewhereElse
The Guid must be known to the client. Maybe you can use the client's SessionID for that (if he has one and if he can't start more than one query at once...or you generate a new Guid on the client each time the user presses the "Search" button, or whatever).
Now all the calculation is done, and the ranked list of results is saved in the table.
Now you can query the table, filtering by the QueryID:
select Rank, Value
from AlgorithmTempTable
where QueryID = '12345678-9012-3456789'
order by Rank
limit 0, 10
Because of the QueryID, multiple users can do this at the same time without interfering each other's query. If you create a new QueryID for each search, the same user can even run multiple queries at once.
Now there's only one thing left to do: delete the temporary data when it's not needed anymore (only the data! The table is never dropped).
So, if the user closes the query screen:
delete
from AlgorithmTempTable
where QueryID = '12345678-9012-3456789'
This is not ideal in some cases, though. If the application crashes, the data stays in the table forever.
There are several better ways. Which one is the best for you depends on your application. Some possibilities:
Upvotes: 2
Reputation: 5184
Paging results can be very tricky. They way I have done this is as follows. Set an upperbound limit for any query that may be run. For example say 5,000. If a query returns more than 5,000 then limit the results to 5,000.
This is best done using a stored procedure.
Upvotes: 0