cisconethead
cisconethead

Reputation: 219

Can I use SQL to split contents of a table column stored as CSV (comma separated values) into individual rows in a new table?

I see there are a couple of related questions with answers, but not exactly what I need, so I'll ask a new question. I have this CSV file with thousands of rows of store inventories data which I'd like to import into a MS SQL Server database and use SQL to process them. After importing the CSV file, the SQL table will have three columns that contain CSV data. The problem is that I need to get this CSV data into individual rows in order to analyze it more. I'd like to end up with two tables in the end: the original table created by the import of the CSV file, and a table created by splitting out the CSV. Here is a representation of what the two tables would look like:

/Table1 (the original CSV file).  First row is column names:
StoreID,Date,StoreName,City,State,Category1CSV,Category2CSV,Category3CSV
1051,2/16/2014,Easton,Columbus,OH,"Flour,Yeast,Baking Powder","Milk,Water,Oil","Cinnamon,Sugar"
1425,1/14/2014,Crocker Park,Westlake,OH,"Baking Powder,Yeast,Flour","Oil,Milk,Water","Rosemay,Cinnamon,Sugar"

Table1

/Table2 (after splitting the CSV column contents). First row is column names:
StoreID,Date,StoreName,City,State,ItemName,ItemRank,ItemCategory
1051,2/16/2014,Easton,Columbus,OH,Flour,1,1
1051,2/16/2014,Easton,Columbus,OH,Yeast,2,1
1051,2/16/2014,Easton,Columbus,OH,Baking Powder,3,1
1051,2/16/2014,Easton,Columbus,OH,Milk,4,2
1051,2/16/2014,Easton,Columbus,OH,Water,5,2
1051,2/16/2014,Easton,Columbus,OH,Oil,6,2
1051,2/16/2014,Easton,Columbus,OH,Cinnamon,7,3
1051,2/16/2014,Easton,Columbus,OH,Sugar,8,3
1425,1/14/2014,Crocker Park,Westlake,OH,Baking Powder,1,1
1425,1/14/2014,Crocker Park,Westlake,OH,Yeast,2,1
1425,1/14/2014,Crocker Park,Westlake,OH,Flour,3,1
1425,1/14/2014,Crocker Park,Westlake,OH,Oil,4,2
1425,1/14/2014,Crocker Park,Westlake,OH,Milk,5,2
1425,1/14/2014,Crocker Park,Westlake,OH,Water,6,2
1425,1/14/2014,Crocker Park,Westlake,OH,Rosemary,7,3
1425,1/14/2014,Crocker Park,Westlake,OH,Cinnamon,8,3
1425,1/14/2014,Crocker Park,Westlake,OH,Sugar,9,3

Table2

The SQL column data types are:

Table 1
StoreID - int
Date - date
StoreName - nvarchar(50)
City- nvarchar(50)
State- nvarchar(50)
Category1CSV - nvarchar(MAX)
Category2CSV - nvarchar(MAX)
Category3CSV - nvarchar(MAX)

Table2
StoreID - int
Date - date
StoreName - nvarchar(50)
City- nvarchar(50)
State - nvarchar(50)
ItemName - nvarchar(50)
ItemRank - tinyint
ItemCategory -tinyint

The Table 1 columns labeled Category1CSV, Category2CSV, and Category3CSV contents map to Table 2 columns: ItemName, ItemRank, ItemCategory, where ItemName is the Item (example: Flour), ItemRank is the order of the item in the CSV list, and ItemCategory is either 1,2 or 3, depending on whether the data came from Category1CSV, Category2CSV or Category3CSV.

The most important aspect of this (other than splitting out the CSV column) is to maintain the order of items within the CSV columns. for example, StroreID 1051 has Category1CSV contents of "Flour,Yeast,Baking Powder". Those will map to the columns ItemName, ItemRank, and ItemCategory such that ItemName = Flour, it's ItemRank = 1, and the ItemCategory = 1. This would be the first row in Table 2. The second row would be ItemName = Yeast, it's ItemRank = 2, and the ItemCategory = 1, and so on until you end up with what looks like Table 2 above. Also, you'll notice that the ItemRank numbering starts with the contents of the column Category1CSV, then continues to Category2CSV and finally Category3CSV.

After that lengthy explanation, is it possible to have some SQL statement that will create Table 2 from Table 1 for me? If so, what would that look like? I'm planning to use MS SQL Server Express 2012.

OR... as someone suggest to me, it may be best to have some VBA in Excel or Python script (maybe in conjunction with Notepad++?) to accomplish this, then just import the final data? I don't care either way, I just can't keep manually editing the CSV file, as it's very tedious and time consuming.

Upvotes: 3

Views: 1466

Answers (1)

rhholt
rhholt

Reputation: 419

I would use a Split function in order to split the additional values.

The Split function I use uses a Numbers (a table with numbers 1 through 1,000,000) in order to facilitate the split process.

Once the Numbers table and Split functions are in place, I would use the CROSS APPLY function to apply the Split to the CSV columns. the code to split the CSV column would look like this (this is just a couple test rows based on your data provided).

DECLARE @Table TABLE (val1 VARCHAR(50), val2 VARCHAR(50), val3 VARCHAR(50), csv1 VARCHAR(100), csv2 VARCHAR(100), csv3 VARCHAR(100))
INSERT INTO @Table
VALUES ('Easton', 'Columbus', 'OH', 'Flour,Yeast,Baking Powder','Milk,Water,Oil','Cinnamon,Sugar')
 , ('Crocker Park', 'Westlake', 'OH', 'Baking Powder,Yeast,Flour','Oil,Milk,Water','Rosemary,Cinnamon,Sugar')

SELECT tbl.val1, val2, val3, apl.*
  FROM @Table tbl
 CROSS APPLY(
    SELECT val
      FROM dbo.Split(tbl.csv1, ',')
   ) apl

UNION ALL

SELECT tbl.val1, val2, val3, apl.*
  FROM @Table tbl
 CROSS APPLY(
    SELECT val
      FROM dbo.Split(tbl.csv2, ',')
   ) apl

UNION ALL

SELECT tbl.val1, val2, val3, apl.*
  FROM @Table tbl
 CROSS APPLY(
    SELECT val
      FROM dbo.Split(tbl.csv3, ',')
   ) apl
 ORDER BY val1

The output of which will look like this based on the sample date.

Crocker Park    Westlake    OH  Baking Powder
Crocker Park    Westlake    OH  Cinnamon
Crocker Park    Westlake    OH  Flour
Crocker Park    Westlake    OH  Milk
Crocker Park    Westlake    OH  Oil
Crocker Park    Westlake    OH  Rosemary
Crocker Park    Westlake    OH  Sugar
Crocker Park    Westlake    OH  Water
Crocker Park    Westlake    OH  Yeast
Easton          Columbus    OH  Baking Powder
Easton          Columbus    OH  Cinnamon
Easton          Columbus    OH  Flour
Easton          Columbus    OH  Milk
Easton          Columbus    OH  Oil
Easton          Columbus    OH  Sugar
Easton          Columbus    OH  Water
Easton          Columbus    OH  Yeast

Here is the code to create the Numbers table

DECLARE @tbl TABLE (n INT)
INSERT INTO @tbl (n)
VALUES (1),(2),(3),(4),(5),(6),(7),(8),(9),(10)

; WITH Num AS(
SELECT one.n
  FROM @tbl one
 CROSS JOIN @tbl two
 CROSS JOIN @tbl three
 CROSS JOIN @tbl four
 CROSS JOIN @tbl five
 CROSS JOIN @tbl six
)
SELECT ROW_NUMBER() OVER(ORDER BY n) AS n
  INTO dbo.Numbers
  FROM Num

ALTER TABLE dbo.Numbers
ALTER COLUMN n INT NOT NULL

ALTER TABLE dbo.Numbers 
ADD PRIMARY KEY (n)
GO

Lastly, here is the code to create the Split function.

CREATE FUNCTION dbo.Split
(
@List VARCHAR(MAX),
@Delimiter VARCHAR(255)
)
RETURNS TABLE
AS
RETURN
(
SELECT val = SUBSTRING(@List, n, CHARINDEX(@Delimiter, @List + @Delimiter, n) - n)
  FROM dbo.Numbers
 WHERE n <= CONVERT(INT, LEN(@List)) 
   AND SUBSTRING(@Delimiter + @List, n, LEN(@Delimiter)) = @Delimiter
 );
GO

Upvotes: 2

Related Questions