Dave Jarvis
Dave Jarvis

Reputation: 31171

Calculate year for end date: PostgreSQL

Background

Users can pick dates as shown in the following screen shot:

Any starting month/day and ending month/day combinations are valid, such as:

The second combination is difficult (I call it the "tricky date scenario") because the year for the ending month/day should come after the year for the starting month/day. That is to say, for the year 1900 (also shown selected in the screen shot above), the full dates would be:

Problem

Writing a SQL statement that selects values from a table with dates that fall between the start month/day and end month/day, regardless of how the start and end days are selected. In other words, this is a year wrapping problem.

Inputs

The query receives as parameters:

Previous Attempt

Consider the following MySQL code (that worked):

end_year = start_year +
  greatest( -1 *
    sign(
      datediff(
        date(
          concat_ws('-', year, end_month, end_day )
        ),
        date(
          concat_ws('-', year, start_month, start_day )
        )
      )
    ), 0
  )

How it works, with respect to the tricky date scenario:

  1. Create two dates in the current year.
  2. The first date is Dec 22, 1900 and the second date is Feb 28, 1900.
  3. Count the difference, in days, between the two dates.
  4. If the result is negative, it means the year for the second date must be incremented by 1. In this case:
    • Add 1 to the current year.
    • Create a new end date: Feb 28, 1901.
    • Check to see if the date range for the data falls between the start and calculated end date.
  5. If the result is positive, the dates have been provided in chronological order and nothing special needs to be done.

This worked in MySQL because the difference in dates would be positive or negative.In PostgreSQL, the equivalent functionality always returns a positive number, regardless of their relative chronological order. (My tests of subtracting dates may have been incorrect, though.)

Question

How should the following (broken) code be rewritten for PostgreSQL to take into consideration the relative chronological order of the starting and ending month/day pairs (with respect to introducing that tricky annual temporal displacement)?

SELECT
  m.amount
FROM
  measurement m
WHERE
  (extract(MONTH FROM m.taken) >= month1 AND
  extract(DAY FROM m.taken) >= day1) AND
  (extract(MONTH FROM m.taken) <= month2 AND
  extract(DAY FROM m.taken) <= day2)

Any thoughts, comments, or questions?

(The dates are pre-parsed into MM/DD format in PHP. My preference is for a pure PostgreSQL solution, but I am open to suggestions on what might make the problem simpler using PHP. The SQL lives in a stored procedure that is called by JasperReports, so the only items PHP can touch are the dates passed into the reporting engine.)

Update #1

The following code nearly works:

select
  (extract(YEAR FROM m.taken)||'-12-12')::date as start_date,
  ((extract(YEAR FROM m.taken)+
      greatest(
        0,
        sign(
          (extract(YEAR FROM m.taken)||'-12-12')::date -
          (extract(YEAR FROM m.taken)||'-02-01')::date) ))||'-02-01')::date as end_date,
  m.taken,
  extract(YEAR FROM m.taken) as year_taken
from
  measurement m
where
  m.station_id = 200 AND
  m.category_id = 1 AND
  (m.taken, m.taken) OVERLAPS
    ((extract(YEAR FROM m.taken)||'-12-12')::date,
    ((extract(YEAR FROM m.taken)+
      greatest(
        0,
        sign(
          (extract(YEAR FROM m.taken)||'-12-12')::date -
          (extract(YEAR FROM m.taken)||'-02-01')::date) ))||'-02-01')::date)

It produces the correct start and end date constraints:

start_date  ;end_date    ;taken       ;year_taken
"1969-12-12";"1970-02-01";"1969-12-12";1969
"1969-12-12";"1970-02-01";"1969-12-13";1969
...
"1969-12-12";"1970-02-01";"1969-12-31";1969
"1970-12-12";"1971-02-01";"1970-12-12";1970

However, the values between 1970-01-01 and 1970-02-01 are missing, but must be included.

Versions

PostgreSQL 8.4.4 and PHP 5.2.10

Upvotes: 2

Views: 3290

Answers (1)

Artefacto
Artefacto

Reputation: 97825

SELECT
    m.amount
FROM
    measurement m,
    to_date('Dec 01 1900', 'Mon DD YYYY') AS A(d1),
    to_date('Feb 28 1900', 'Mon DD YYYY') AS B(d2),
    to_date('Feb 28 1901', 'Mon DD YYYY') AS C(d3)
WHERE m.taken
    BETWEEN
        d1 AND
        CASE WHEN d2 < d1 THEN d3 ELSE d2 END

References: conditional expressions, data type formatting functions.

EDIT: Sorry, I thought you wanted a specific year:

SELECT
    amount
FROM
    (SELECT
        M.amount, M.taken,
        to_date('Dec 01 ' || extract(YEAR FROM M.taken), 'Mon DD YYYY'),
        to_date('Feb 28 ' || extract(YEAR FROM M.taken), 'Mon DD YYYY')
     FROM
        measurement AS M
     ) AS A(amount, taken, d1, d2)
WHERE
    (d2 >= d1 AND taken BETWEEN d1 AND d2)
    OR
    (d2 < d1 AND (taken <= d2 OR taken >= d1));

If the set is big, this doesn't have a lot of chances for optimization. In that case, you can have an SQL function that converts all the dates to a certain year (say taken - (extract(year from taken) - 1900) * '1 year'::interval) and then compare with Dec 02 1900 and Feb 28 1900. That way you can index the result of this date conversion function AND you don't have to calculate two dates for each entry.

Upvotes: 1

Related Questions