Reputation: 57

Connect by return more rows than expected

I am making a query for retrieving "future months" based on a project's forecast budget.

Basically I am taking the last forecasted date (START_DATE) and the future date which I wish to make the forecast to (END_DATE) so I need to fill everything in between with months.

Through some research I found "CONNECT BY" could help a lot.

Simplyfying it, the query looks like this:

SELECT     TO_CHAR (ADD_MONTHS (START_DATE, LEVEL - 1), 'fmMonth')
     FROM (SELECT PROJECT_ID, FORECAST_VALUE, START_DATE, END_DATE
             FROM PROJECTS
           WHERE PROJECT_ID = 001)
CONNECT BY LEVEL <=
               MONTHS_BETWEEN (TRUNC (END_DATE, 'MM'),
                               TRUNC (START_DATE, 'MM')
                              )
             * +1

The query works just fine when selecting one project, however when selecting several or all projects/rows, the query breaks down and returns many more rows than expected.

The source data look something like this:

PROJECT_ID | FORECAST_VALUE | START_DATE | END_DATE 
-----------+----------------+------------+-----------
 001       |  100           | 2017-01-01 | 2017-03-01
 002       |  200           | 2017-01-01 | 2017-05-01 
 003       |  200           | 2017-10-01 | 2018-01-01

What I would expect to see would be something like this

PROJECT_ID | FORECAST_VALUE | FORECAST_YEAR | FORECAST_MONTH 
-----------+----------------+---------------+-----------
 001       |  100           | 2017          | JANUARY
 001       |  100           | 2017          | FEBRUARY
 001       |  100           | 2017          | MARCH

 002       |  200           | 2017          | JANUARY
 002       |  200           | 2017          | FEBRUARY
 002       |  200           | 2017          | MARCH
 002       |  200           | 2017          | APRIL
 002       |  200           | 2017          | MAY

 003       |  200           | 2017          | OCTOBER
 003       |  200           | 2017          | NOVEMBER
 003       |  200           | 2017          | DECEMBER
 003       |  200           | 2018          | JANUARY

However I get a ton more months and years than expected.

How can I solve this?

Thanks!

Upvotes: 2

Answers (3)

user5683823

Reputation:

Since you have no condition other than the one you put in the CONNECT BY, each row at each level generates many more rows at the next level (there is no tracking of each PROJECT_ID at each level). You need to link the rows by PROJECT_ID = PRIOR PROJECT_ID. But this will lead to "cycles"; CONNECT BY... detects cycles by looking at the columns affected by the PRIOR operator only, not at ALL the columns. You can break cycles by adding an irrelevant PRIOR condition that will guarantee different values for different rows; traditionally SYS_GUID() is used for that.

Modify your query as follows:

SELECT     TO_CHAR (ADD_MONTHS (START_DATE, LEVEL - 1), 'fmMonth')
     FROM (SELECT PROJECT_ID, FORECAST_VALUE, START_DATE, END_DATE
             FROM PROJECTS
           WHERE PROJECT_ID = 001)
CONNECT BY LEVEL <=
               MONTHS_BETWEEN (TRUNC (END_DATE, 'MM'),
                               TRUNC (START_DATE, 'MM')
                              )
             * +1         -- whatever that means (copied from original post)
       AND PROJECT_ID = PRIOR PROJECT_ID
       AND PRIOR SYS_GUID() IS NOT NULL

I am assuming, of course, that PROJECT_ID is a unique key (perhaps Primary Key?) in the base table PROJECTS.

Upvotes: 1

Nick Krasnov

Reputation: 27261

Here is one way to do it. We simply take minimum start_date and maximum end_date and generate everything in between, and then join to our projects table.

create table projects(project_id, forecast_value, start_date, end_date) as(
  select 001, 100, date '2017-01-01', date '2017-03-01' from dual union all
  select 002, 200, date '2017-01-01', date '2017-05-01' from dual union all 
  select 003, 200, date '2017-10-01', date '2018-01-01' from dual
 );


with 
   dates(dt) as(
       select add_months(s_date, level - 1) as dt
      from (
            select min(start_date) as s_date
                 , max(end_date)   as e_date
              from projects
            ) 
       connect by add_months(s_date , level - 1) <= e_date
       )
select p.project_id
     , p.forecast_value
     , extract(year from d.dt) as forcast_year
     , to_char(d.dt, 'MONTH') as forecast_month
 from projects p
 join dates d
   on (trunc(d.dt, 'mm') between trunc(p.start_date, 'mm') 
                             and trunc(p.end_date, 'mm'))
order by p.project_id, d.dt

Result:

PROJECT_ID FORECAST_VALUE FORCAST_YEAR FORECAST_MONTH
---------- -------------- ------------ --------------
         1            100         2017 JANUARY       
         1            100         2017 FEBRUARY      
         1            100         2017 MARCH         
         2            200         2017 JANUARY       
         2            200         2017 FEBRUARY      
         2            200         2017 MARCH         
         2            200         2017 APRIL         
         2            200         2017 MAY           
         3            200         2017 OCTOBER       
         3            200         2017 NOVEMBER      
         3            200         2017 DECEMBER      
         3            200         2018 JANUARY       

12 rows selected.

Upvotes: 0

Aleksej

Reputation: 22969

A simple way could be joining your table with a table of numbers, assuming that you will have no more than, say, 1000 months:

select PROJECT_ID, FORECAST_VALUE, START_DATE, END_DATE, TO_CHAR (ADD_MONTHS (START_DATE, num - 1), 'fmMonth')
from PROJECTS
inner join (           
            select level as num
            from dual
            connect by level <= 1000
           ) nums           
on (num -1 <= months_between( TRUNC (END_DATE, 'MM'),
                             TRUNC (START_DATE, 'MM'))   ) 
order by 1, num

Upvotes: 0

Connect by return more rows than expected

Answers (3)

Related Questions