user-2147482428
user-2147482428

Reputation: 167

Fill in missing rows based on mixed dates

I am performing a mapping in Informatica IICS and attempting to fill in missing rows in my dataset based on several fields.

Below is a sample table of the data. There is an ID field, a Week_Start field that is the starting date of the week the data is reported for, a corresponding Week_Number and a Year field that either specifies the data as belonging to the prior year or the current year. Sales is the number of sales made by that specific ID and Sales_Type is the category of sales.

There are dates where the specific person did not make a sale, however, and so the row corresponding to that data is missing. I would like to fill in those rows with all the related information, and the Sales field set to 0.

My actual data has a 6 week window of information, for both prior and current year, for 7 different sales types. So I'd expect to have 6x2x7 = 84 rows per ID. i.e. If I have 100 unique IDs, my final table should have 8400 rows.

Table with rows missing:

+----+------------+-------------+---------+-------+------------+
| ID | Week_Start | Week_Number |  Year   | Sales | Sales_Type |
+----+------------+-------------+---------+-------+------------+
|  1 | 01/01/2018 |           1 | Prior   |     1 | A          |
|  1 | 01/08/2018 |           2 | Prior   |     3 | A          |
|  1 | 01/15/2018 |           3 | Prior   |     3 | A          |
|  1 | 01/29/2018 |           5 | Prior   |     4 | A          |
|  1 | 01/01/2019 |           1 | Current |     2 | A          |
|  1 | 01/08/2019 |           2 | Current |     4 | A          |
|  1 | 01/15/2019 |           3 | Current |     1 | A          |
|  1 | 01/22/2019 |           4 | Current |     1 | A          |
|  1 | 01/01/2018 |           1 | Prior   |     1 | B          |
|  1 | 01/08/2018 |           2 | Prior   |     3 | B          |
|  1 | 01/15/2018 |           3 | Prior   |     3 | B          |
|  1 | 01/29/2018 |           5 | Prior   |     4 | B          |
|  1 | 01/01/2019 |           1 | Current |     2 | B          |
|  1 | 01/08/2019 |           2 | Current |     4 | B          |
|  1 | 01/15/2019 |           3 | Current |     1 | B          |
|  1 | 01/22/2019 |           4 | Current |     1 | B          |
+----+------------+-------------+---------+-------+------------+

Expected result with the missing rows filled in:

+----+------------+-------------+---------+-------+------------+
| ID | Week_Start | Week_Number |  Year   | Sales | Sales_Type |
+----+------------+-------------+---------+-------+------------+
|  1 | 01/01/2018 |           1 | Prior   |     1 | A          |
|  1 | 01/08/2018 |           2 | Prior   |     3 | A          |
|  1 | 01/15/2018 |           3 | Prior   |     3 | A          |
|  1 | 01/22/2018 |           4 | Prior   |     0 | A          |
|  1 | 01/29/2018 |           5 | Prior   |     4 | A          |
|  1 | 01/01/2019 |           1 | Current |     2 | A          |
|  1 | 01/08/2019 |           2 | Current |     4 | A          |
|  1 | 01/15/2019 |           3 | Current |     1 | A          |
|  1 | 01/22/2019 |           4 | Current |     1 | A          |
|  1 | 01/29/2019 |           5 | Current |     0 | A          |
|  1 | 01/01/2018 |           1 | Prior   |     1 | B          |
|  1 | 01/08/2018 |           2 | Prior   |     3 | B          |
|  1 | 01/15/2018 |           3 | Prior   |     3 | B          |
|  1 | 01/22/2018 |           4 | Prior   |     0 | B          |
|  1 | 01/29/2018 |           5 | Prior   |     4 | B          |
|  1 | 01/01/2019 |           1 | Current |     2 | B          |
|  1 | 01/08/2019 |           2 | Current |     4 | B          |
|  1 | 01/15/2019 |           3 | Current |     1 | B          |
|  1 | 01/22/2019 |           4 | Current |     1 | B          |
|  1 | 01/29/2019 |           5 | Current |     0 | B          |
+----+------------+-------------+---------+-------+------------+

I have tried to use transformations within ICS but none of them accomplish what I'm trying to do. My best guess as to how to do this is by using a Recursive CTE in SQL and pulling in a SQL script to generate these missing rows.

My question is, how do I do this over several partitions? It is not just missing dates I'm interested in, it's missing dates for two years and several different types of sales. This is further complicated by the fact that the Week_Start column contains mixed data. My early attempts at doing this ended up generating all rows between a date in 2018 and a data in 2019.

Upvotes: 1

Views: 109

Answers (1)

Gordon Linoff
Gordon Linoff

Reputation: 1270553

Use cross join to generate the rows and left join to bring in the values:

select w.week_start, w.week_number, ys.year, ys.sales_type,
       coalesce(t.sales, 0) as sales
from (select distinct week_start, week_number from t) w cross join
     (select distinct year, sales_type from t) ys left join
     t
     on t.week_start = w.week_start and
        t.year = ys.year and
        t.sales_type = ys.sales_type;

Upvotes: 1

Related Questions