jrara
jrara

Reputation: 17011

How to fill missing values using analytical functions?

I would like to fill missing null values from my dataset. I have a data set like this

+---------------------+------+-------------+
| ORDER_DATE          | SHOP | SALESPERSON |
+---------------------+------+-------------+
| 14/04/2017 04:44:27 | A    | MIKE        |
+---------------------+------+-------------+
| 14/04/2017 04:44:55 | A    |             |
+---------------------+------+-------------+
| 14/04/2017 04:45:07 | A    | TIM         |
+---------------------+------+-------------+
| 14/04/2017 04:45:30 | A    |             |
+---------------------+------+-------------+
| 14/04/2017 04:45:43 | B    |             |
+---------------------+------+-------------+
| 14/04/2017 04:46:13 | B    | JOHN        |
+---------------------+------+-------------+
| 14/04/2017 04:46:28 | B    |             |
+---------------------+------+-------------+
| 14/04/2017 04:58:32 | C    |             |
+---------------------+------+-------------+
| 14/04/2017 04:58:41 | C    | MELINDA     |
+---------------------+------+-------------+

and I like to fill the salesperson information partitioned by shop using the first found value before the null values within the shop. I tried this but this does not produce the correct result (below). How to solve this?

CREATE TABLE SALES (
ORDER_DATE DATE, 
SHOP VARCHAR2(30 CHAR), 
SALESPERSON VARCHAR2(30 CHAR)
)
;

REM INSERTING INTO SALES
SET DEFINE OFF;
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:27','DD/MM/YYYY HH24:MI:SS'),'A','MIKE');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:44:55','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:07','DD/MM/YYYY HH24:MI:SS'),'A','TIM');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:30','DD/MM/YYYY HH24:MI:SS'),'A',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:45:43','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:13','DD/MM/YYYY HH24:MI:SS'),'B','JOHN');
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:46:28','DD/MM/YYYY HH24:MI:SS'),'B',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:32','DD/MM/YYYY HH24:MI:SS'),'C',NULL);
INSERT INTO SALES (ORDER_DATE,SHOP,SALESPERSON) VALUES (TO_DATE('14/04/2017 04:58:41','DD/MM/YYYY HH24:MI:SS'),'C','MELINDA');
COMMIT;

SELECT * FROM SALES ORDER BY SHOP, ORDER_DATE;

SELECT ORDER_DATE,
       SHOP,
       SALESPERSON,
       /*tried two approaches*/
       /*does not produce a correct result set*/
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER (PARTITION BY SHOP
                   ORDER BY ORDER_DATE RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING) AS LAST_VALUE_1,
       /*this also does not solve this*/            
       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER(PARTITION BY SHOP
                  ORDER BY ORDER_DATE ROWS BETWEEN UNBOUNDED PRECEDING AND 1 PRECEDING) AS LAST_VALUE_2
FROM SALES ;

The correct result set is:

+---------------------+------+-------------+--------------------+
| ORDER_DATE          | SHOP | SALESPERSON | SALESPERSON_FILLED |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:27 | A    | MIKE        |  MIKE              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:44:55 | A    |             |  MIKE              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:07 | A    | TIM         |  TIM               |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:30 | A    |             |  TIM               |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:45:43 | B    |             |                    |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:13 | B    | JOHN        |  JOHN              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:46:28 | B    |             |  JOHN              |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:32 | C    |             |                    |
+---------------------+------+-------------+--------------------+
| 14/04/2017 04:58:41 | C    | MELINDA     |  MELINDA           |
+---------------------+------+-------------+--------------------+

Upvotes: 4

Views: 554

Answers (3)

ITSGuru
ITSGuru

Reputation: 194

In SQL SERVER 2008, Using By CASE WHEN, ORDER BY :

Code :

 SELECT CONVERT(DATETIME,ORDER_DATE) AS ORDER_DATE,
    ISNULL(SHOP,'') AS SHOP,
    ISNULL(SALESPERSON,'') AS SALESPERSON,
    CASE WHEN SALESPERSON IS NULL OR SALESPERSON = '' THEN 
    ISNULL((SELECT TOP 1 ISNULL(SALESPERSON,'')  FROM SALES_stack WHERE ORDER_DATE < S.ORDER_DATE
    AND ISNULL(SHOP,'') = ISNULL(S.SHOP,'') 
    ORDER BY ORDER_DATE DESC),'')
    ELSE  ISNULL(SALESPERSON,'') END AS SALESPERSON_FILLED FROM SALES_stack S

OUTPUT :

    ORDER_DATE              SHOP   SALESPERSON  SALESPERSON_FILLED
    2017-04-14 04:44:27.000 A      MIKE         MIKE
    2017-04-14 04:44:55.000 A                   MIKE
    2017-04-14 04:45:07.000 A      TIM          TIM
    2017-04-14 04:45:30.000 A                   TIM
    2017-04-14 04:45:43.000 B       
    2017-04-14 04:46:13.000 B      JOHN         JOHN
    2017-04-14 04:46:28.000 B                   JOHN
    2017-04-14 04:58:32.000 C       
    2017-04-14 04:58:41.000 C      MELINDA      MELINDA

Upvotes: 0

krokodilko
krokodilko

Reputation: 36127

You was very close.
Try this:

SELECT ORDER_DATE,
       SHOP,
       SALESPERSON,

       LAST_VALUE(SALESPERSON) IGNORE NULLS OVER 
            (PARTITION BY SHOP ORDER BY ORDER_DATE ) AS LAST_VALUE_1

FROM SALES
order by shop, order_date;

ORDER_DA SHOP                           SALESPERSON                    LAST_VALUE_1                  
-------- ------------------------------ ------------------------------ ------------------------------
17/04/14 A                              MIKE                           MIKE                          
17/04/14 A                                                             MIKE                          
17/04/14 A                              TIM                            TIM                           
17/04/14 A                                                             TIM                           
17/04/14 B                                                                                           
17/04/14 B                              JOHN                           JOHN                          
17/04/14 B                                                             JOHN                          
17/04/14 C                                                                                           
17/04/14 C                              MELINDA                        MELINDA                       

9 rows selected. 

Upvotes: 3

Giorgos Betsos
Giorgos Betsos

Reputation: 72185

The following query works in SQL Server. I don't see any reason why it shouldn't work in Oracle:

SELECT ORDER_DATE, SHOP, 
       MAX(SALESPERSON) OVER (PARTITION BY SHOP, grp) AS SALESPERSON
FROM (
   SELECT ORDER_DATE, SHOP, SALESPERSON,
          SUM(CASE WHEN SALESPERSON IS NOT NULL THEN 1 END) 
          OVER
          (PARTITION BY SHOP ORDER BY ORDER_DATE) AS grp
   FROM mytable) AS t
ORDER BY ORDER_DATE

This is what the inner query produces:

ORDER_DATE              SHOP SALESPERSON  grp
---------------------------------------------
2017-04-14 04:44:27.000 A    MIKE         1
2017-04-14 04:44:55.000 A    NULL         1
2017-04-14 04:45:07.000 A    TIM          2
2017-04-14 04:45:30.000 A    NULL         2
2017-04-14 04:45:43.000 B    NULL         NULL
2017-04-14 04:46:13.000 B    JOHN         1
2017-04-14 04:46:28.000 B    NULL         1
2017-04-14 04:58:32.000 C    NULL         NULL
2017-04-14 04:58:41.000 C    MELINDA      1

So using field grp, together with field SHOP, we can identify an 'island' of records that should share the same SALESPERSON value.

Upvotes: 0

Related Questions