Chipmunk_da
Chipmunk_da

Reputation: 507

SQL - split data into two columns based on identifier

I have a table that contains the product categories/headers and product names but all in one column. I need to split it out into separate Category and Product columns. I also have a helper column Header_Flag which can be used to determine if the row contains the header or product name. The input table looks like this:

|---------------------|------------------|
|      Product        |     Header_Flag  |
|---------------------|------------------|
|      Furniture      |         Y        |
|      Bed            |         N        |
|      Table          |         N        |
|      Chair          |         N        |
|      Cosmetics      |         Y        |
|      Lip balm       |         N        |
|      Lip stick      |         N        |
|      Eye liner      |         N        |
|      Apparel        |         Y        |
|      Shirt          |         N        |
|      Trouser        |         N        |
|---------------------|------------------|

The output format I'm looking for would be like this:

|---------------------|------------------|
|      Category       |     Product      |
|---------------------|------------------|
|      Furniture      |       Bed        |
|      Furniture      |       Table      |
|      Furniture      |       Chair      |
|      Cosmetics      |       Lip balm   |
|      Cosmetics      |       Lip stick  |
|      Cosmetics      |       Eye liner  |
|      Apparel        |       Shirt      |
|      Apparel        |       Trouser    |
|---------------------|------------------|

Upvotes: 0

Views: 331

Answers (1)

Thom A
Thom A

Reputation: 95554

With the data is it stands, you cannot get the results you are after. To be able to achieve this, you need to be able to order your data, using an ORDER BY clause, and ordering on either of these column does not achieve the same result as the sample data:

CREATE TABLE dbo.YourTable (Product varchar(20),HeaderFlag char(1));
GO

INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
      ('Bed','N'),
      ('Table','N'),
      ('Chair','N'),
      ('Cosmetics','Y'),
      ('Lipbalm','N'),
      ('Lipstick','N'),
      ('Eyeliner','N'),
      ('Apparel','Y'),
      ('Shirt','N'),
      ('Trouser','N');
GO

SELECT *
FROM dbo.YourTable
ORDER BY Product;
GO

SELECT *
FROM dbo.YourTable
ORDER BY HeaderFlag
GO

DROP TABLE dbo.YourTable;

AS you can see, the orders both differ.

If you add a column you can order on though (I'm going to use an IDENTITY) then you can achieve this:

CREATE TABLE dbo.YourTable (I int IDENTITY, Product varchar(20),HeaderFlag char(1));
GO

INSERT INTO dbo.YourTable
VALUES('Furniture','Y'),
      ('Bed','N'),
      ('Table','N'),
      ('Chair','N'),
      ('Cosmetics','Y'),
      ('Lipbalm','N'),
      ('Lipstick','N'),
      ('Eyeliner','N'),
      ('Apparel','Y'),
      ('Shirt','N'),
      ('Trouser','N');
GO

SELECT *
FROM dbo.YourTable YT
ORDER BY I;

Then you can use a cumulative COUNT to put the values into groups and get the header:

WITH Grps AS(
    SELECT YT.I,
           YT.Product,
           YT.HeaderFlag,
           COUNT(CASE YT.HeaderFlag WHEN 'Y' THEN 1 END) OVER (ORDER BY I ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW) AS Grp
    FROM dbo.YourTable YT),
Split AS(
    SELECT G.I,
           MAX(CASE G.HeaderFlag WHEN 'Y' THEN Product END) OVER (PARTITION BY G.Grp) AS Category,
           G.Product,
           G.HeaderFlag
    FROM Grps G)
SELECT S.Category,
       S.Product
FROM Split S
WHERE HeaderFlag = 'N'
ORDER BY S.I;

Upvotes: 3

Related Questions