filimonic
filimonic

Reputation: 4644

Is there any algorithm suitable for compression of GPS waypoints?

I am looking for any algorithm for lossy compression of GPS track waypoints with coordinates in EPSG:4326 CRS (this usual degrees like (75.223423,-10.123123))

For short, after wiping out meta information and simplification with Ramer–Douglas–Peucker algorithm, I have an ordered sequence of waypoint coordinates, each waypoint takes 16 bytes (2 x 8-byte double).

Using the knowledge that waypoints are ordered, and the distance between waypoints in most cases is less than 0.01° (~1 km at equator), I made an assumption, that there may be some kind of lossy compression algorithm for such sequences.

Could you help me find it out, please.

UPD: According to real tracks (~800 analyzed), the distance in degrees between points is shown below. P95 is 95th percentile of all distances.

LON    
    avg: 0,000334560520818109
    p95: 0,001239999999999240 # ~138 meters
    max: 0,307273900000000000
LAT    
    avg: 0,000221987685948093
    p95: 0,000839999999996621
    max: 0,309625799999999000

Upvotes: 2

Views: 1092

Answers (3)

filimonic
filimonic

Reputation: 4644

I found an existing solution: There is TinyWKB or TWKB or Tiny Well-known Binary data format suits my needs.

  • It stores geometry primitives (it has LineString support for my current needs).
  • It stores data using deltas.
  • It supports precision for floating points (+2 to -2).
  • It supports ID-data for geometries.
  • It can be written using .Net module for NetTopologySuite.

TinyWKB is developed from WKT -> WKB -> TWKB to store geometry primitives (Points, LineStrings, Polygons, MultiPoints, MultiLineStrings, MultiPolygons, GeometryCollections)

Upvotes: 1

dmuir
dmuir

Reputation: 4431

In case you need to write the code yourself, here's a trick for a simple 'delta' encoding scheme that avoids losing accuracy unnecessarily.

The idea is that the compressor should compute deltas NOT between the current data point and the last, but between the current data point and what the decompressor will have computed for the last. This means that the quantisation errors will not add up.

As a simple example that you might want to peruse/experiment with here's c code that compresses doubles into a (start double) and a sequence of floats:

typedef struct
{   double  start;
    int n;
    float*  delta;
} compT;

compT   compress( int n, const double* data)
{
compT   c = (compT){ data[0], n-1, malloc( (n-1)*sizeof *c.delta)};
double  r = c.start;
    for( int i=1; i<n; ++i)
    {
    float   d = (float)(data[i]-r);
        c.delta[i-1] = d;
        r += d;
    }
    return c;
}

static  double* uncompress( compT c)
{
double* d = malloc( (c.n+1)*sizeof *d);
double  r = c.start;
    d[0] = r;
    for( int i=1; i<=c.n; ++i)
    {   r += c.delta[i-1];
        d[i] = r;
    }
    return d;
}
compT   bad_compress( int n, const double* data)
{
compT   c = (compT){ data[0], n-1, malloc( (n-1)*sizeof *c.delta)};
    for( int i=1; i<n; ++i)
    {
    float   d = (float)(data[i]-data[i-1]);
        c.delta[i-1] = d;
    }
    return c;
}

When using floats the quantization error build up is really only noticeable on long (millions) data sequences.

However when hoping to compress more, the effects are more noticeable. The code below uses an int_16t for the deltas. I've used this in cases where the data values were guaranteed to be around 1Km, so I used a scale (in the code below) of 16, so could cope with differences of 2km.

typedef struct
{   float   scale;
    double  start;
    int n;
    int16_t*    delta;
} compiT;

compiT  compressi( int n, const double* data, float scale)
{
compiT  c = (compiT){ scale, data[0], n-1, malloc( (n-1)*sizeof *c.delta)};
double  r = c.start;
    for( int i=1; i<n; ++i)
    {
    int16_t d = (int16_t)round(c.scale*(data[i]-r));
        c.delta[i-1] = d;
        r += ((double)d)/c.scale;
    }
    return c;
}

compiT  bad_compressi( int n, const double* data, float scale)
{
compiT  c = (compiT){ scale, data[0], n-1, malloc( (n-1)*sizeof *c.delta)};
    for( int i=1; i<n; ++i)
    {
    int16_t d = (int16_t)round(c.scale*(data[i]-data[i-1]));
        c.delta[i-1] = d;
    }
    return c;
}

static  double* uncompressi( compiT c)
{
double* d = malloc( (c.n+1)*sizeof *d);
double  r = c.start;
    d[0] = r;
    for( int i=1; i<=c.n; ++i)
    {
    double  delta = ((double)c.delta[i-1])/c.scale;
        r += delta;
        d[i] = r;
    }
    return d;
}

In runs of arbitrary length, the maximum error (ie the differences between the original data and the decompressed compressed data) was, as it should be, around 3cm, whereas using the bad_compressor the errors were around 0.5m on runs of 1000, 2.5m on runs of 10000

Of course where you have no guarantees of boundedness of the differences, you'll need to incorporate some sort of restart.

Upvotes: 2

Mark Adler
Mark Adler

Reputation: 112394

You don't need an eight-byte floating point format for the number of significant figures you expect, and for the limited range of possible values. I would start by converting the data to a sequence of integers of appropriate length that can represent the accuracy and range of your values. It looks like two four-byte integers should suffice. There's a factor-of-two compression right there.

Then replace each point with the difference of that point and the previous one, except for the first point. Now the integers should be smaller, allowing for any general purpose lossless compressor to reduce that further. If the differences are only around 0.1°, you could get another factor of two.

That is a simple example, where the prediction for this point is the last point. If your sequence of points represent a path, you could do something more sophisticated where you model a velocity. In that case you propagate the last point using the last modeled velocity, and subtract that from the current point.

Amendment

I found that the WGS84 reference frame itself is accurate to only 2-5 meters. With three-byte integers, you can get 2.4 meters resolution at the equator. That provides a 62.5% reduction, before differencing and compression. It also gives you a free bit in the latitude, since it has half the range of a longitude. That bit could be used to mark whether this is an absolute coordinate or predicted from the last coordinate.

Upvotes: 2

Related Questions