DenStudent
DenStudent

Reputation: 928

Using STRING_SPLIT for 2 columns in a single table

I've started from a table like this

ID | City                               | Sales
1  | London,New York,Paris,Berlin,Madrid| 20,30,,50
2  | Istanbul,Tokyo,Brussels            | 4,5,6

There can be an unlimited amount of cities and/or sales.

I need to get each city and their salesamount their own record. So my result should look something like this:

ID | City                               | Sales
1  | London                             | 20
1  | New York                           | 30
1  | Paris                              | 
1  | Berlin                             | 50
1  | Madrid                             | 
2  | Istanbul                           | 4 
2  | Tokyo                              | 5
2  | Brussels                           | 6  

What I got so far is

SELECT ID, splitC.Value, splitS.Value
FROM Table
CROSS APLLY STRING_SPLIT(Table.City,',') splitC
CROSS APLLY STRING_SPLIT(Table.Sales,',') splitS

With one cross apply, this works perfectly. But when executing the query with a second one, it starts to multiply the number of records a lot (which makes sense I think, because it's trying to split the sales for each city again).

What would be an option to solve this issue? STRING_SPLIT is not neccesary, it's just how I started on it.

Upvotes: 0

Views: 5999

Answers (2)

Zhorov
Zhorov

Reputation: 30023

STRING_SPLIT() is not an option, because (as is mentioned in the documantation) the output rows might be in any order and the order is not guaranteed to match the order of the substrings in the input string.

But you may try with a JSON-based approach, using OPENJSON() and string transformation (comma-separated values are transformed into a valid JSON array - London,New York,Paris,Berlin,Madrid into ["London","New York","Paris","Berlin","Madrid"]). The result from the OPENJSON() with default schema is a table with columns key, value and type and the key column is the 0-based index of each item in this array:

Table:

CREATE TABLE Data (
   ID int,
   City varchar(1000),
   Sales varchar(1000)
)
INSERT INTO Data 
   (ID, City, Sales)
VALUES   
   (1, 'London,New York,Paris,Berlin,Madrid', '20,30,,50'),
   (2, 'Istanbul,Tokyo,Brussels',             '4,5,6')

Statement:

SELECT d.ID, a.City, a.Sales
FROM Data d
CROSS APPLY (
   SELECT c.[value] AS City, s.[value] AS Sales
   FROM OPENJSON(CONCAT('["', REPLACE(d.City, ',', '","'), '"]')) c
   LEFT OUTER JOIN OPENJSON(CONCAT('["', REPLACE(d.Sales, ',', '","'), '"]')) s 
      ON c.[key] = s.[key]
) a 

Result:

ID  City     Sales
1   London   20
1   New York 30
1   Paris   
1   Berlin   50
1   Madrid   NULL
2   Istanbul 4
2   Tokyo    5
2   Brussels 6

Upvotes: 2

Thom A
Thom A

Reputation: 96028

STRING_SPLIT has no context of what oridinal positions are. In fact, the documentation specifically states that it doesn't care about it:

The order of the output may vary as the order is not guaranteed to match the order of the substrings in the input string.

As a result, you need to use something that is aware of such basic things, such as DelimitedSplit8k_LEAD.

Then you can do something like this:

WITH Cities AS(
    SELECT ID,
           DSc.Item,
           DSc.ItemNumber
    FROM dbo.YourTable YT
         CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.City,',') DSc)
Sales AS(
    SELECT ID,
           DSs.Item,
           DSs.ItemNumber
    FROM dbo.YourTable YT
         CROSS APPLY dbo.DelimitedSplit8k_LEAD(YT.Sales,',') DSs)
SELECT ISNULL(C.ID,S.ID) AS ID,
       C.Item AS City,
       S.Item AS Sale
FROM Cities C
     FULL OUTER JOIN Sales S ON C.ItemNumber = S.ItemNumber;

Of course, however, the real solution is fix your design. This type of design is going to only cause you 100's of problems in the future. Fix it now, not later; you'll reap so many rewards sooner the earlier you do it.

Upvotes: 1

Related Questions