Reputation: 9349
I've been given a big project by a big client and I've been working on it for 2 months now. I'm getting closer and closer to a solution but it's just so insanely complex that I can't quite get there, and so I need ideas.
The project is quite simple: There is a 1mil+ database of lat/lng coordinates with lots of additional data for each record. A user will visit a page and enter some search terms which will filter out quite a lot of the records. All of the records that match the filter are displayed (often clustered) on a Google Maps.
The problem with this is that the client demands it's fast, lean, and low-bandwidth. Hence, I'm stuck. What I'm currently doing is: Present the first clusters, and when they hover over a cluster, begin loading in the data for that clusters children.
However, I've upped it to 30,000 of the millions of listings and it's starting to drag a little. I've made as many optimizations that I possibly can. When the filter is changed, I AJAX a query to the DB and return all the ID's of the matches, then update the map to reflect this.
So, optimization is not an option. I need an entirely new conceptual model for this. Any input at all would be highly appreciated, as this is an incredibly complex project of which I can't find anything in history even remotely close to it- I even looked at MMORPG's which have a lot of similar problems, and I have made a few, but the concept of having a million players in one room is still something MMORPG makers cringe at. It's getting common that people think there may be bottlenecks, but let me say that it's not a case of optimizing this way. I need a new model in which a huge database stays on the server, but is displayed fluidly to the user.
I'll be awarding 500 rep as soon as it becomes available for anything that solves this.
Thanks- Daniel.
Upvotes: 2
Views: 289
Reputation: 12831
Google has a service named "Big Query". It is a sql Server in the cloud. It uses its fast servers for sql and it can search millions of data rows quickly. Unfortunately it is not free.. but maybe it will help you out:
https://developers.google.com/bigquery/
Upvotes: 0
Reputation: 1761
Let's be honest here; a db with 1 million records being accessed by presumably a large amount of users, is not going to run very well unless you put some extremely powerful hardware behind it.
In this type of case, I would suggest using several different database servers, and setting up some decent load balancing regimes in order to keep them running as smoothly as possible. First and foremost, you will need to find out the "average" load you can place on a db server before it starts to lag up; let's say for example, this is 50,000 records. Setting a low MaxClients per server may assist you with server performance and preventing against crashes, but it might aggravate your users when they can't execute any queries due to high load.. but it's something to keep in mind if your budget doesn't allow for much wiggle room hardware-wise.
On the topic of hardware however, that's something you really need to take a look at. Databases typically don't use a huge amount of CPU/RAM, but they can be quite taxing on your HDD. I would recommend going for SAS or SSD before looking at other components on your setup; these will make the world of a difference for you.
As far as load balancing goes, a very common technique used for most content providers is that when one query/particular content item (such as a popular video on youtube etc) is pulling in an above average amount of traffic, you can cache its result. A quick and dirty approach to this is to use an if statement in your search bar, which will then grab a static html page instead of actually running the query.
Another approach to this is to have a seperate db server on standalone, only for running queries which are taking in an excessive amount of traffic.
With that, never underestimate your code optimisation. While the differences may seem subtle to you, when run across millions of queries by thousands of users, those tiny differences really do add up.
Best of luck with it - let me know if you need any further assistance.
Upvotes: 1
Reputation: 33512
I think there are a number of possible answers to your question depending on where it is slowing down, so here goes a few thoughts.
A wider table can effect the speed with which a query is returned. Longer records mean that more disc is being accessed to get the right data, so you might want to think about limiting your initial table to hold only the information that can be filtered out. Having said that, it will also depend on the db engine you are using, some suffer more than others.
Ensuring that your tables are correctly indexed makes a HUGE difference in performance. You need to make sure that the query is using the indexes to quickly get to the records that it needs.
A friend was working with Google Maps and said that the API really suffered if too much was displayed on the maps. This might just be totally out of your control.
Having worked for Epic Games in the past, the reason that "millions of players in a room" is something to cringe at is more often hardware driven. In a game, having that number of players would grind the graphics card to a halt as it tries to render all the polygons of the models. Secondly (and likely more importantly) the problem would be that you have to send each client information about what each item/player is doing. This means that your bandwidth use will spike very heavily. Your server might handle the load, but the players internet connection might not.
I do think that you need to edit your question though with some extra information on WHAT is slowing down. Your database? Your query? Google API? The transfer of data between server and client machine?
Upvotes: 1