Kamil Gosciminski
Kamil Gosciminski

Reputation: 17157

Using BETWEEN operator with timestamp values in Postgres

Recently I've been informed by a StackOverflow user that using BETWEEN operator with values of data type timestamp without time zone should not be used. Below is the quote.

Between means >= and <= and shall not be used with ranges that contain timestamps.

When asked for an explanation of this thesis or a link to Postgres documentation where it states that I've got an answer saying

Why would such a simple thing need a site with documentation. I am sure you can find many anyway if you google (at least my detailed posts on various forums demonstrating the case)

Well I googled. And found nothing that would advise against using this operator with timestamp values. In fact this answer on SO uses them and so does this mailing group post.

I was informed that all these years I was doing it wrong. Is it really the case?

As far as I know Postgres max precision for a timestamp is 1 microsecond - correct me if I'm wrong. Thus aren't below statements equivalent ?

sample_date BETWEEN x AND y::timestamp - INTERVAL '1 microsecond'

and

sample_date >= x AND sample_date < y

Edit: The sample is just a consideration of the difference. I'm aware of the fact that developers can miss the time part, but assuming one knows how it behaves, why should it not be used? Generally speaking, this is merely a sample, but I'm wondering about the bigger scope. I've been investigating the planner and it seems to be parsing BETWEEN to >= AND <=.

Why does one preferably write >= AND <= than BETWEEN in the matter of results - not including the time to translate it?

Upvotes: 2

Views: 16364

Answers (3)

decibel
decibel

Reputation: 421

sample_date BETWEEN x AND y::timestamp - INTERVAL '1 microsecond'

... might actually still be broken due to leap seconds. But lets ignore that...

IMO the biggest reason not to use BETWEEN is because what it tells the database to do doesn't actually match the way our brains interpret it. If I say "it happened sometime between Jan 3rd and Jan 6th", most (or at least many) people will translate that into the interval "[Jan 3rd, Jan 6th)", which is definitely NOT what you get with the SQL expression "BETWEEN 'Jan 3rd' AND 'Jan 6th'". So even though that SQL expression has a completely defined meaning, that meaning conflicts with how most/many people read it in English. But if you use >= AND <, there's absolutely no question about what the intention is.

Upvotes: 0

James
James

Reputation: 126

I assume this is a reference the SO user would have shared: https://wiki.postgresql.org/wiki/Don't_Do_This

Don't use BETWEEN with timestamps

I used BETWEEN with two timestamps in a recent query and it worked but took much longer than expected. So, I can't verify any specific issue except, perhaps, performance; although this could be due to my giant data set.

I retried the same query with the >= timestamp1 AND < timestamp2 pattern and performance was about the same.

I suppose a real concern here might be using timestamps without timezone and being sure you're getting the right records?

Upvotes: 0

Laurenz Albe
Laurenz Albe

Reputation: 246788

There is absolutely nothing wrong with using ts BETWEEN validfrom AND validto instead of ts >= validform AND ts <= validto. They are the same.

I can only guess, but I'd say that the warning targets something different, namely whether either of the (identical) clauses above are the right thing to use.

Now this of course depends on what you are trying to do, but very often clauses like this are used to identify the one valid row for a particular timestamp. In that case the clause as above is wrong, because for a value of ts when the row was changed, you would get two results.

Consider this:

CREATE TABLE names (
   id integer PRIMARY KEY,
   val text NOT NULL,
   validfrom timestamptz NOT NULL,
   validto timestamptz NOT NULL
);

INSERT INTO names VALUES (1, 'Smith', '1985-05-02 00:00:00', '2009-01-30 00:00:00');
INSERT INTO names VALUES (2, 'Jones', '2009-01-30 00:00:00', 'infinity');

This is meant to be a historized table of names for a person.

If you use a WHERE clause like above to query for the name valid at a certain time, it would work well for

SELECT val FROM names
WHERE current_timestamp BETWEEN validfrom AND validto;

But it would do the wrong thing for

SELECT val FROM names
WHERE '2009-01-30' BETWEEN validfrom AND validto;

That is because the end point of the interval of validity for a name is not part of the interval. For this case, it would be correct to write:

SELECT val FROM names
WHERE '2009-01-30' >= validfrom AND '2009-01-30' < validto;

Upvotes: 2

Related Questions