Yanni Pattas
Yanni Pattas

Reputation: 25

Databricks : Equivalent code for SQL query

I'm looking for the equivalent databricks code for the query. I added some sample code and the expected as well, but in particular I'm looking for the equivalent code in Databricks for the query. For the moment I'm stuck on the CROSS APPLY STRING SPLIT part.

Sample SQL data:

    CREATE TABLE FactTurnover
    (
    ID INT,
    SalesPriceExcl NUMERIC (9,4),
  Discount VARCHAR(100)
  )
 INSERT INTO FactTurnover
 VALUES 
   (1, 100, '10'),
   (2, 39.5877, '58, 12'),
   (3, 100, '50, 10, 15'),
   (4, 100, 'B')

Query:

    ;WITH CTE AS
    (
     SELECT Id, SalesPriceExcl, 
         CASE WHEN value = 'B' THEN 0
         ELSE CAST(value as int) END AS Discount
     From FactTurnover
     CROSS APPLY STRING_SPLIT(Discount, ',')
     )
     SELECT Id,  
       Min(SalesPriceExcl) AS SalesPriceExcludingDiscount,
       EXP(SUM(LOG((100 - Discount) / 100.0))) As TotalDiscount,
       Cast(EXP(SUM(LOG((100 - Discount) / 100.0))) * 
            MIN(SalesPriceExcl) As Numeric(9,2))
        PriceAfterDiscount
     FROM CTE
     GROUP BY ID

Expected Results:

| Id | SalesPriceExcludingDiscount |       TotalDiscount | PriceAfterDiscount |

|----|-----------------------------|---------------------|--------------------|

|  1 |                         100 |                 0.9 |                 90 |

|  2 |                     39.5877 | 0.36960000000000004 |              14.63 |

|  3 |                         100 | 0.38250000000000006 |              38.25 |

|  4 |                         100 |                   1 |                100 |

Upvotes: 2

Views: 3279

Answers (1)

wBob
wBob

Reputation: 14389

Use SPLIT to convert the comma-separated string to an array then use LATERAL VIEW and EXPLODE to do operations on the elements of that array. The roughly equivalent syntax (including CTEs) is:

%sql
--SELECT * FROM FactTurnover;

WITH cte AS
(
SELECT *
FROM
  (
  SELECT Id, SalesPriceExcl, SPLIT ( Discount, ',' ) AS discountArray
  FROM FactTurnover
  ) x
  LATERAL VIEW EXPLODE ( discountArray ) x AS xdiscount
)
SELECT 
  Id,
  MIN(SalesPriceExcl) AS SalesPriceExcludingDiscount,
  EXP ( SUM( LOG( ( 100 - xdiscount ) / 100.00 ) ) ) AS TotalDiscount
FROM cte
GROUP BY Id
ORDER BY Id

If you are feeling brave, you could also do this using higher order functions. I've included two examples below. I would say these are harder to debug and you should probably try them performance-wise, it depends what you're comfortable with:

%sql
-- Convert Discount text column to array with SPLIT function and filter out value 'B' from the array
;WITH filterB AS (
SELECT *, FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ) discountArray
FROM FactTurnover
), cte1 AS (
-- Do initial calcs on array
SELECT 
  Id,
  TRANSFORM ( discountArray, discountArray -> LOG( ( 100 - discountArray ) / 100.00 ) ) discountArray2
FROM filterB
)
SELECT
  Id,
  EXP( AGGREGATE ( discountArray2, CAST( 0 AS DOUBLE ), ( x, y ) -> x + y ) ) AS x
FROM cte1;

-- all in one example
SELECT 
    Id,
    EXP( AGGREGATE( TRANSFORM( FILTER ( SPLIT ( Discount, ',' ), x -> x != 'B' ), y -> LOG( ( 100 - y ) / 100.00 ) ), CAST( 0 AS DOUBLE ), ( z, a ) -> z + a ) )
    AS final
FROM FactTurnover
ORDER BY Id

Upvotes: 2

Related Questions