creedcode
creedcode

Reputation: 13

How to split strings in columns, into rows (on two rows, that match position) SQL

Ok, here we go. My first ever post and question.

I'm using SQL server 2019. I have a table, im interested in two columns from this table.

Both columns contain array-like data (they are both a set of strings, comma delimited. I want to take each substring from the columns and show them alone in their own 'cell' and instead have many rows.

Now, I can get it to work with one of the columns using STRING_SPLIT. However, when I use string split on both columns, I dont get the order I need. i.e for example see below

column1: 'my_fav_drink','my_fav_colour','my_fav_avenger'

column2: 'coffee','blue','dr_strange'

The position of the strings will always match, so the 3rd entry in column 1 will always relate to the 3rd entry in column 2.

But when I run the string split, the orders get messed up

I have tried self joins, sub selects and messed around with order by's but I just can't seem to get it working in order.

I have searched online for while, so Im not just coming to the community without first trying, and I will appreciate any help or guidance, and I looked here for the answer of course but I have not had any luck.

Update: Please consider the following:

  1. This was not built by me, I'm diving into this without a tech specification having to go through the DB, i'm sure some of you will know my pain. If I get approval, I will re-design this but I can't for now (I could make another table though). The data comes in automatically from a machine like this as a list of strings

  2. There are about 50 values in each column

Desired output is:

desired_output_table

The whole point of this is so that I can do analysis in another software (again, was a request) but the array can't be sorted out in the software.

Thanks guys,

Much appreciated

Upvotes: 0

Views: 1140

Answers (3)

Harish C
Harish C

Reputation: 11

SELECT d.column1,b.value as Label
FROM Data d
CROSS APPLY string_split(cast(d.column1 as varchar(max)),',')b

SELECT d.column2,c.value as Value
FROM Data d
CROSS APPLY string_split(cast(d.column2 as varchar(max)),',')c

Try this approach

Upvotes: 1

Zhorov
Zhorov

Reputation: 29983

Probably the best solution is to change the design, but if you need to parse the input data, you may a try a JSON-based approach. The idea is to transform the values from column1 and column2 columns into valid JSON arrays (my_fav_drink, my_fav_colour, my_fav_avenger into ["my_fav_drink", "my_fav_colour", "my_fav_avenger"]) and parse the arrays using OPENJSON() and default schema. The result from the OPENJSON() execution is a table with columns key, value and type and the values in the key column are the indexes of the elements in the input array.

Table:

CREATE TABLE Data (
   column1 varchar(100),
   column2 varchar(100)
)
INSERT INTO Data (column1, column2)
VALUES ('my_fav_drink,my_fav_colour,my_fav_avenger', 'coffee,blue,dr_strange')

Statement:

SELECT CONVERT(int, j1.[key]) + 1 AS [Number], j1.[value] AS [Label], j2.[value] AS [Value]
FROM Data d
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(d.column1, 'json'), ',', '","'), '"]')) j1
CROSS APPLY OPENJSON(CONCAT('["', REPLACE(STRING_ESCAPE(d.column2, 'json'), ',', '","'), '"]')) j2
WHERE j1.[key] = j2.[key]

Result:

Number  Label           Value
1       my_fav_drink    coffee
2       my_fav_colour   blue
3       my_fav_avenger  dr_strange

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1270361

This is a bit long for a comment.

Your method of storing values is really, really bad. For instance, guaranteeing that the two columns have the same number of values is troublesome. And then imagine what happens if one of the values could have a comma!

There are basically four alternatives.

  1. If you know what the columns are, store them! If doesn't matter if some rows have NULL values. That's fine. If you want to get fancy, you might have different sets of columns stored in different tables.
  2. A variation on (1) is to use sparse columns. This allows more columns than would normally be allowed on the row -- assuming most of the values are NULL.
  3. A key-value store. That is, you would have another table with multiple rows per "entity" and one row per key/value pair. Such an EAV data model (entity-attribute-value) can be quite powerful, but it generally assumes that all values are the same type and precludes the definition of foreign key relationships.
  4. Use XML or JSON to encode the values. In conjunction with putting used fields in column, this provides extensibility and often reasonable performance.

Note that your method of storing multiple values in a string with headers in another string is not on this list.

Upvotes: 0

Related Questions