rfreytag
rfreytag

Reputation: 1193

PostgreSQL Inserted rows differ from select

I have a problem with an INSERT in PostgreSQL. I have this query:

INSERT INTO track_segments(tid, gdid1, gdid2, distance, speed)
SELECT * FROM (
SELECT DISTINCT ON (pga.gdid) 
pga.tid as ntid,
pga.gdid as gdid1, pgb.gdid as gdid2,
ST_Distance(pga.geopoint, pgb.geopoint) AS segdist, 
(ST_Distance(pga.geopoint, pgb.geopoint) / EXTRACT(EPOCH FROM (pgb.timestamp - pga.timestamp + interval '0.1 second'))) as speed
FROM fl_pure_geodata AS pga
LEFT OUTER JOIN fl_pure_geodata AS pgb ON (pga.timestamp < pgb.timestamp AND pga.tid = pgb.tid) 
ORDER BY pga.gdid ASC) AS sq
WHERE sq.gdid2 IS NOT NULL;

to fill a table with pairwise connected segements of geopoints. When I run the SELECT alone I get the correct pairs, but when I use it in the statement above, then some are paired the wrong way or not at all. Here's what I mean:

result of SELECT alone:

tid;gdid1;gdid2;distance;speed
"0f6fd522-5f1e-49a4-b85e-50f11ef7f908";10;11;34.105058803;31.0045989118182
"0f6fd522-5f1e-49a4-b85e-50f11ef7f908";11;12;90.099603143;14.7704267447541
"0f6fd522-5f1e-49a4-b85e-50f11ef7f908";12;13;23.331326565;21.2102968772727

result after INSERT with the same SELECT:

tid;gdid1;gdid2;distance;speed
"0f6fd522-5f1e-49a4-b85e-50f11ef7f908";10;12;122.574;17.2639603638028
"0f6fd522-5f1e-49a4-b85e-50f11ef7f908";11;12;90.0996;14.7704267447541
"0f6fd522-5f1e-49a4-b85e-50f11ef7f908";12;13;23.3313;21.2102968772727

What be the cause of that? It's exactly the same SELECT statement for the INSERT, so why does it give different results?

Upvotes: 0

Views: 66

Answers (2)

Ihor Romanchenko
Ihor Romanchenko

Reputation: 28611

DISTINCT ON (pga.gdid) can pick any row from a set with equal pga.gdid. You can get different result even by execution the same query for several times. Add additional ordering to get consistent results. something like: pga.gdid ASC, pgb.gdid ASC

BTW You may want to order by pga.gdid ASC, pgb.timestamp - pga.timestamp ASC to get the "next" point.

BTW2 It may be easier to use lead() or lag() window functions to calculate differences between current row and next/previous. This way you wont need a self join and will likely get better performance.

Upvotes: 2

Castillo
Castillo

Reputation: 145

You are ordering your query results only by the column pga.gdid, which is the same in all the rows, so postgres will order the results in a different way each time you do the select query.

Upvotes: 1

Related Questions