Reputation: 4644
I am looking for any algorithm for lossy compression of GPS track waypoints with coordinates in EPSG:4326 CRS
(this usual degrees like (75.223423,-10.123123))
For short, after wiping out meta information and simplification with Ramer–Douglas–Peucker algorithm, I have an ordered sequence of waypoint coordinates, each waypoint takes 16 bytes (2 x 8-byte double
).
Using the knowledge that waypoints are ordered, and the distance between waypoints in most cases is less than 0.01° (~1 km at equator), I made an assumption, that there may be some kind of lossy compression algorithm for such sequences.
Could you help me find it out, please.
UPD: According to real tracks (~800 analyzed), the distance in degrees between points is shown below. P95 is 95th percentile of all distances.
LON
avg: 0,000334560520818109
p95: 0,001239999999999240 # ~138 meters
max: 0,307273900000000000
LAT
avg: 0,000221987685948093
p95: 0,000839999999996621
max: 0,309625799999999000
Upvotes: 2
Views: 1092
Reputation: 4644
I found an existing solution: There is TinyWKB
or TWKB
or Tiny Well-known Binary
data format suits my needs.
LineString
support for my current needs).TinyWKB is developed from WKT -> WKB -> TWKB
to store geometry primitives (Points, LineStrings, Polygons, MultiPoints, MultiLineStrings, MultiPolygons, GeometryCollections)
Upvotes: 1
Reputation: 4431
In case you need to write the code yourself, here's a trick for a simple 'delta' encoding scheme that avoids losing accuracy unnecessarily.
The idea is that the compressor should compute deltas NOT between the current data point and the last, but between the current data point and what the decompressor will have computed for the last. This means that the quantisation errors will not add up.
As a simple example that you might want to peruse/experiment with here's c code that compresses doubles into a (start double) and a sequence of floats:
typedef struct
{ double start;
int n;
float* delta;
} compT;
compT compress( int n, const double* data)
{
compT c = (compT){ data[0], n-1, malloc( (n-1)*sizeof *c.delta)};
double r = c.start;
for( int i=1; i<n; ++i)
{
float d = (float)(data[i]-r);
c.delta[i-1] = d;
r += d;
}
return c;
}
static double* uncompress( compT c)
{
double* d = malloc( (c.n+1)*sizeof *d);
double r = c.start;
d[0] = r;
for( int i=1; i<=c.n; ++i)
{ r += c.delta[i-1];
d[i] = r;
}
return d;
}
compT bad_compress( int n, const double* data)
{
compT c = (compT){ data[0], n-1, malloc( (n-1)*sizeof *c.delta)};
for( int i=1; i<n; ++i)
{
float d = (float)(data[i]-data[i-1]);
c.delta[i-1] = d;
}
return c;
}
When using floats the quantization error build up is really only noticeable on long (millions) data sequences.
However when hoping to compress more, the effects are more noticeable. The code below uses an int_16t for the deltas. I've used this in cases where the data values were guaranteed to be around 1Km, so I used a scale (in the code below) of 16, so could cope with differences of 2km.
typedef struct
{ float scale;
double start;
int n;
int16_t* delta;
} compiT;
compiT compressi( int n, const double* data, float scale)
{
compiT c = (compiT){ scale, data[0], n-1, malloc( (n-1)*sizeof *c.delta)};
double r = c.start;
for( int i=1; i<n; ++i)
{
int16_t d = (int16_t)round(c.scale*(data[i]-r));
c.delta[i-1] = d;
r += ((double)d)/c.scale;
}
return c;
}
compiT bad_compressi( int n, const double* data, float scale)
{
compiT c = (compiT){ scale, data[0], n-1, malloc( (n-1)*sizeof *c.delta)};
for( int i=1; i<n; ++i)
{
int16_t d = (int16_t)round(c.scale*(data[i]-data[i-1]));
c.delta[i-1] = d;
}
return c;
}
static double* uncompressi( compiT c)
{
double* d = malloc( (c.n+1)*sizeof *d);
double r = c.start;
d[0] = r;
for( int i=1; i<=c.n; ++i)
{
double delta = ((double)c.delta[i-1])/c.scale;
r += delta;
d[i] = r;
}
return d;
}
In runs of arbitrary length, the maximum error (ie the differences between the original data and the decompressed compressed data) was, as it should be, around 3cm, whereas using the bad_compressor the errors were around 0.5m on runs of 1000, 2.5m on runs of 10000
Of course where you have no guarantees of boundedness of the differences, you'll need to incorporate some sort of restart.
Upvotes: 2
Reputation: 112394
You don't need an eight-byte floating point format for the number of significant figures you expect, and for the limited range of possible values. I would start by converting the data to a sequence of integers of appropriate length that can represent the accuracy and range of your values. It looks like two four-byte integers should suffice. There's a factor-of-two compression right there.
Then replace each point with the difference of that point and the previous one, except for the first point. Now the integers should be smaller, allowing for any general purpose lossless compressor to reduce that further. If the differences are only around 0.1°, you could get another factor of two.
That is a simple example, where the prediction for this point is the last point. If your sequence of points represent a path, you could do something more sophisticated where you model a velocity. In that case you propagate the last point using the last modeled velocity, and subtract that from the current point.
Amendment
I found that the WGS84 reference frame itself is accurate to only 2-5 meters. With three-byte integers, you can get 2.4 meters resolution at the equator. That provides a 62.5% reduction, before differencing and compression. It also gives you a free bit in the latitude, since it has half the range of a longitude. That bit could be used to mark whether this is an absolute coordinate or predicted from the last coordinate.
Upvotes: 2