Reputation: 448
I have a table with a few hundred thousand rows and the data format is index (int), and a words nvarchar(1000). The words string is made up of a collection of words separated by a space, e.g word1 word2 word3
. I want to read the word table and create a dictionary. In terms of pseudo code this is what I want:
INSERT INTO dictionary (dictionaryword)
SELECt splitBySpace(words) FROM word;
This is simple enough to code in Java or C#, but I have found the system takes a long time to process the data. In other processing the cost benefit to running SQL to handle the query (i.e not processing the data in c# or Java) is huge.
I want to create a stored procedure which reads the words, splits them, and then creates the dictionary. I have seen various split procedures which are a little complex, e.g https://dba.stackexchange.com/questions/21078/t-sql-table-valued-function-to-split-a-column-on-commas but I could not see how to re-code this for the task of reading a whole database, splitting the words, and inserting them.
Has anyone any sample code to split the column data and then insert it which can wholly implemented in SQL for reasons of efficiency?
Upvotes: 3
Views: 1657
Reputation: 16968
I suggest you to use a stored procedure like this:
CREATE PROCEDURE spSplit
@words nvarchar(max),
@delimiter varchar(1) = ' '
AS
BEGIN
SET NOCOUNT ON;
DECLARE @sql nvarchar(max)
SELECT @sql = 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION ALL SELECT ''') + ''''
--or for removing duplicates SELECT @sql = 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION SELECT ''') + ''''
EXEC(@sql)
END
GO
This stored procedure will give you the results that you can use it in INSERT INTO
statement, and:
CREATE PROCEDURE spSplit
@words nvarchar(max) = 'a bc lkj weu 234 , sdsd 3 and 3 & test',
@delimiter varchar(1) = ' ',
@destTable nvarchar(255),
@destColumn nvarchar(255)
AS
BEGIN
SET NOCOUNT ON;
DECLARE @sql nvarchar(max)
SELECT @sql = 'INSERT INTO [' + @destTable + '] ([' + @destColumn + ']) SELECT res FROM ('
SELECT @sql = @sql + 'SELECT ''' + REPLACE(@words, @delimiter, ''' As res UNION ALL SELECT ''') + ''''
SELECT @sql = @sql + ') DT WHERE res NOT IN (SELECT [' + @destColumn + '] FROM [' + @destTable + '])'
EXEC(@sql)
END
GO
This stored procedure will do the insert with out inserting duplicates.
Upvotes: 0
Reputation: 35780
Here is the solution.
DDL:
create table sentence(t varchar(100))
insert into sentence values
('Once upon a time in America'),
('Eyes wide shut')
DML:
select distinct ca.d as words from sentence s
cross apply(select split.a.value('.', 'varchar(100)') as d
from
(select cast('<x>' + REPLACE(s.t, ' ', '</x><x>') + '</x>' as xml) as d) as a
cross apply d.nodes ('/x') as split(a)) ca
Output:
words
a
America
Eyes
in
Once
shut
time
upon
wide
Fiddle http://sqlfiddle.com/#!6/54dff/4
Upvotes: 2