Reputation: 2569
I need to be able to quickly find n closest destinations for a given destinations, calculate n x n distance matrix for n destinations and several other such operation related to distances between two or more destination.
I have learned a Graph DB will give far better performance compared to a MySQL database. My application is written in PHP.
SO my question is - Is it possible to use Graph DB with a PHP application, If yes then which one is the best option and opensource and how to store this data in graph DB and how would it be accessed.
Thanks in advance.
Upvotes: 4
Views: 2936
Reputation: 5771
Actually it's not that much about database as about indexes. I've used MongoDB's geospatial indexing and search (document DB), which has geo indexing designed for finding multiple nearest elements to given coordinates - with good results. Still - it runs only simple queries (find nearest) and it gets a bit slow if your index doesn't fit in the RAM (I've used geonames DB with 8mln places with coordinates and got 0.005-2.5s per query on VM - 1. hdd overhead 2. probably the index didn't fit in the RAM).
Upvotes: 1
Reputation: 31280
Neo4j is a very solid graph DB and has flexible (if a bit complex) licensing as well. It implements the Blueprints API and should be pretty easy to use from just about any language, including PHP. It also has a REST API as well, which is about as flexible as it gets, and there is at least one good example of using it from PHP.
Depending on what data you have, there are a number of ways to store it.
If you have "route" data, where your points are already connected to each other via specific paths (ie. you can't jump from one point directly to another), then you simply make each point a node and the connections you have between points in your routes are edges between nodes, with the distances as properties of those edges. This would give you a graph that looks like your classic "traveling salesman" sort of problem, and calculating distances between nodes is just a matter of doing a weighted breadth-first search (assuming you want shortest path).
If you can jump from place to place with your data set, then you have a fully connected graph. Obviously this is a lot of data, and grows quadratically as you add more destinations, but a graph DB is probably better at dealing with this than a relational DB is. To store the distances, as you add nodes to the graph, you also add an edge to each other existing node with the distance pre-calculated as one of it's properties. Then, to retrieve the distances between a pair of nodes, you simply find the edge between them and get it's distance property.
However, if you have a large number of fully-connected nodes, you would probably be better off just storing the coordinates of those nodes and calculating the distances as-needed, and optionally caching the results to speed things up.
Lastly, if you use the Blueprints API and the other tools in that stack, like Gremlin and Rexter, you should be able to swap in/out any compatible graph database, which lets you play around with different implementations that may meet your needs better, like using Titan on top of a Cassandra / Hadoop cluster.
Upvotes: 4
Reputation: 154563
Yes, a graph database will give you more performance than an extension for MySQL or Postgres will be able to. One that looks really slick is OrientDB, a there's a beta implementation in PHP using the binary protocol and another one that uses HTTP as the transport layer.
As for the example code, Alessandro (from odino.org) wrote a implementation of Dijkstra's algorithm along with a full explanation of how to use it with OrientDB to find the minimum distance between cities.
Upvotes: 1