Reputation: 706
I'm using spark version 3.2.1 on databricks (DBR 10.4 LTS), and I'm trying to convert sql server query into a databricks sql query that runs on a spark cluster using spark sql in sql syntax. However, spark sql does not seem to support XML PATH as a SQL function and I wonder if there is an alternative way to convert this sql server query into a sql query that spark sql will accept and run. The original sql server query looks like this and when I run it in databricks:
%sql
DROP TABLE if exists UserCountry;
CREATE TABLE if not exists UserCountry (
UserID INT,
Country VARCHAR(5000)
);
INSERT INTO UserCountry
SELECT
L.UserID AS UserID,
COALESCE(
STUFF(
(SELECT ', ' + LC.Country FROM UserStopCountry LC WHERE L.UserID = LC.UserID FOR XML PATH (''))
, 1, 2, '')
, '') AS Country
FROM
LK_ETLRunUserID L
When I run the query above in databricks spark sql, I get the following error:
ParseException:
mismatched input 'FOR' expecting {')', '.', '[', 'AND', 'BETWEEN', 'CLUSTER', 'DISTRIBUTE', 'DIV', 'EXCEPT', 'GROUP', 'HAVING', 'IN', 'INTERSECT', 'IS', 'LIKE', 'ILIKE', 'LIMIT', NOT, 'OR', 'ORDER', 'QUALIFY', RLIKE, 'MINUS', 'SORT', 'UNION', 'WINDOW', EQ, '<=>', '<>', '!=', '<', LTE, '>', GTE, '+', '-', '*', '/', '%', '&', '|', '||', '^', ':', '::'}(line 6, pos 80)
== SQL ==
INSERT INTO UserCountry
SELECT
L.UserID AS UserID,
COALESCE(
STUFF(
(SELECT ', ' + LC.Country FROM UserStopCountry LC WHERE L.UserID = LC.UserID FOR XML PATH (''))
--------------------------------------------------------------------^^^
, 1, 2, '')
, '') AS Country
FROM
LK_ETLRunUserID L
Given that the UserStopCountry looks like this:
UserID | Country |
---|---|
1 | 'US' |
1 | 'US' |
2 | 'MEXICO' |
2 | 'US' |
3 | 'US' |
I believe the output will be:
UserID | Country |
---|---|
1 | 'US', 'US' |
2 | 'MEXICO', 'US' |
3 | 'US' |
Upvotes: 0
Views: 43