soulkphp
soulkphp

Reputation: 3833

What is the most efficient way to find the euclidean distance in 3d using mysql?

I have a MySQL table with thousands of data points stored in 3 columns R, G, B. how can I find which data point is closest to a given point (a,b,c) using Euclidean distance?

I'm saving RGB values of colors separately in a table, so the values are limited to 0-255 in each column. What I'm trying to do is find the closest color match by finding the color with the smallest euclidean distance.

I could obviously run through every point in the table to calculate the distance but that wouldn't be efficient enough to scale. Any ideas?

Upvotes: 11

Views: 6585

Answers (5)

David Svarrer
David Svarrer

Reputation: 111

I think the above comments are all true, but they are - in my humble opinion - not answering the original question. (Correct me if I'm wrong). So, let me here add my 50 cents:

You are asking for a select statement, which, given your table is called 'colors', and given your columns are called r, g and b, they are integers ranged 0..255, and you are looking for the value, in your table, closest to a given value, lets say: rr, gg, bb, then I would dare trying the following:

select min(sqrt((rr-r)*(rr-r)+(gg-g)*(gg-g)+(bb-b)*(bb-b))) from colors;

Now, this answer is given with a lot of caveats, as I am not sure I got your question right, so pls confirm if it's right, or correct me so that I can be of assistance.

Upvotes: 3

user845279
user845279

Reputation: 2804

  1. Since you're looking for the minimum distance and not exact distance you can skip the square root. I think Squared Euclidean Distance applies here.
  2. You've said the values are bounded between 0-255, so you can make an indexed look up table with 255 values.

Here is what I'm thinking in terms of SQL. r0, g0, and b0 represent the target color. The table Vector would hold the square values mentioned above in #2. This solution would visit all the records but the result set can be set to 1 by sorting and selecting only the first row.

select 
    c.r, c.g, c.b,
    mR.dist + mG.dist + mB.dist as squared_dist
from 
    colors c,
    vector mR,
    vector mG,
    vector mB
where
    c.r-r0 = mR.point and
    c.g-g0 = mG.point and
    c.b-b0 = mB.point
group by
    c.r, c.g, c.b

Upvotes: 2

Bailey Parker
Bailey Parker

Reputation: 15905

The first level of optimization that I see you can do would be square the distance to which you want to limit the query so that you don't need to perform the square root for each row. The second level of optimization I would encourage would be some preprocessing to alleviate the need for extraneous squaring for each query (which could possibly create some extra run time for large tables of RGB's). You'd have to do some benchmarking to see, but by substituting in values for a, b, c, and d and then performing the query, you could alleviate some stress from MySQL.

Latex

Note that the performance difference between the last two lines may be negligible. You'll have to use test queries on your system to determine which is faster.

I just re-read and noticed that you are ordering by distance. In which case, the d should be removed everything should be moved to one side. You can still plug in the constants to prevent extra processing on MySQL's end.

Upvotes: 2

JustinDanielson
JustinDanielson

Reputation: 3185

If you run through every point and calculate the distance, don't use the square root function, it isn't necessary. The smallest sum of squares will be enough.

This is the problem you are trying to solve. (Planar case, select all points sorted by a x, y, or z axis. Then use PHP to process them)

MySQL also has a Spatial Database which may have this as a function. I'm not positive though.

Upvotes: 0

shaunhusain
shaunhusain

Reputation: 19748

I believe there are two options.

You have to either as you say iterate across the entire set and compare and check against a maximum that you set initially at an impossibly low number like -1. This runs in linear time, n times (since you're only comparing 1 point to every point in the set, this scales in a linear way).

I'm still thinking of another option... something along the lines of doing a breadth first search away from the input point until a point is found in the set at the searched point, but this requires a bit more thought (I imagine the 3D space would have to be pretty heavily populated for this to be more efficient on average though).

Upvotes: 0

Related Questions