user9532692
user9532692

Reputation: 706

Using "FOR XML PATH" in Databricks Spark SQL in sql syntax

I'm using spark version 3.2.1 on databricks (DBR 10.4 LTS), and I'm trying to convert sql server query into a databricks sql query that runs on a spark cluster using spark sql in sql syntax. However, spark sql does not seem to support XML PATH as a SQL function and I wonder if there is an alternative way to convert this sql server query into a sql query that spark sql will accept and run. The original sql server query looks like this and when I run it in databricks:

%sql
DROP TABLE if exists UserCountry;
CREATE TABLE if not exists UserCountry (
  UserID INT,
  Country VARCHAR(5000)
);
INSERT INTO UserCountry
SELECT 
    L.UserID AS UserID,
    COALESCE(
        STUFF(
            (SELECT ', ' + LC.Country FROM UserStopCountry LC WHERE L.UserID = LC.UserID FOR XML PATH (''))
            , 1, 2, '')
        , '') AS Country
FROM 
LK_ETLRunUserID L

When I run the query above in databricks spark sql, I get the following error:

ParseException: 
mismatched input 'FOR' expecting {')', '.', '[', 'AND', 'BETWEEN', 'CLUSTER', 'DISTRIBUTE', 'DIV', 'EXCEPT', 'GROUP', 'HAVING', 'IN', 'INTERSECT', 'IS', 'LIKE', 'ILIKE', 'LIMIT', NOT, 'OR', 'ORDER', 'QUALIFY', RLIKE, 'MINUS', 'SORT', 'UNION', 'WINDOW', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', '&', '|', '||', '^', ':', '::'}(line 6, pos 80)

== SQL ==
INSERT INTO UserCountry
SELECT 
    L.UserID AS UserID,
    COALESCE( 
        STUFF( 
            (SELECT ', ' + LC.Country FROM UserStopCountry LC WHERE L.UserID = LC.UserID FOR XML PATH ('')) 
--------------------------------------------------------------------^^^
            , 1, 2, '')
        , '') AS Country
FROM 
LK_ETLRunUserID L

Given that the UserStopCountry looks like this:

UserID Country
1 'US'
1 'US'
2 'MEXICO'
2 'US'
3 'US'

I believe the output will be:

UserID Country
1 'US', 'US'
2 'MEXICO', 'US'
3 'US'

Upvotes: 0

Views: 43

Answers (0)

Related Questions