Trevor Oakley
Trevor Oakley

Reputation: 448

Split column data and insert - SQL Server stored procedures

I have a table with a few hundred thousand rows and the data format is index (int), and a words nvarchar(1000). The words string is made up of a collection of words separated by a space, e.g word1 word2 word3. I want to read the word table and create a dictionary. In terms of pseudo code this is what I want:

INSERT INTO dictionary (dictionaryword) 
SELECt splitBySpace(words) FROM word;

This is simple enough to code in Java or C#, but I have found the system takes a long time to process the data. In other processing the cost benefit to running SQL to handle the query (i.e not processing the data in c# or Java) is huge.

I want to create a stored procedure which reads the words, splits them, and then creates the dictionary. I have seen various split procedures which are a little complex, e.g https://dba.stackexchange.com/questions/21078/t-sql-table-valued-function-to-split-a-column-on-commas but I could not see how to re-code this for the task of reading a whole database, splitting the words, and inserting them.

Has anyone any sample code to split the column data and then insert it which can wholly implemented in SQL for reasons of efficiency?

Upvotes: 3

Views: 1657

Answers (2)

shA.t
shA.t

Reputation: 16968

I suggest you to use a stored procedure like this:

CREATE PROCEDURE spSplit 
    @words nvarchar(max),
    @delimiter varchar(1) = ' '
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @sql nvarchar(max)
    SELECT @sql = 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION ALL SELECT ''') + ''''
    --or for removing duplicates SELECT @sql = 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION SELECT ''') + ''''
    EXEC(@sql)
END
GO

This stored procedure will give you the results that you can use it in INSERT INTO statement, and:

CREATE PROCEDURE spSplit 
    @words nvarchar(max) = 'a bc lkj weu 234 , sdsd 3 and 3 & test',
    @delimiter varchar(1) = ' ',
    @destTable nvarchar(255), 
    @destColumn nvarchar(255)
AS
BEGIN
    SET NOCOUNT ON;
    DECLARE @sql nvarchar(max)
    SELECT @sql = 'INSERT INTO [' + @destTable + '] ([' + @destColumn + ']) SELECT res FROM ('
    SELECT @sql = @sql + 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION ALL SELECT ''') + ''''
    SELECT @sql = @sql + ') DT WHERE res NOT IN (SELECT [' + @destColumn + '] FROM [' + @destTable + '])'
    EXEC(@sql)
END
GO

This stored procedure will do the insert with out inserting duplicates.

Upvotes: 0

Giorgi Nakeuri
Giorgi Nakeuri

Reputation: 35780

Here is the solution.

DDL:

create table sentence(t varchar(100))

insert into sentence values
('Once upon a time in America'),
('Eyes wide shut')

DML:

select distinct ca.d as words from sentence s
cross apply(select split.a.value('.', 'varchar(100)') as d 
            from 
            (select cast('<x>' + REPLACE(s.t, ' ', '</x><x>') + '</x>' as xml) as d) as a 
             cross apply d.nodes ('/x') as split(a)) ca

Output:

words

a
America
Eyes
in
Once
shut
time
upon
wide

Fiddle http://sqlfiddle.com/#!6/54dff/4

Upvotes: 2

Related Questions