wysiwyg
wysiwyg

Reputation: 624

Extract number from delimited string, with varying formats

I'm trying to join two tables, OrderLines and Production, from our ERP system, running SQL Server 2014. The way it works is, if a product is added to an order and there is nothing in stock for that product, a "Production Entry" (Bill of Materials, essentially) is generated automatically for Production.

The GeneratedEntries field in the OrderLines table captures that data, recording which Production Entry numbers were generated for that order line. The field usually is in the following format: It starts with PD~, then is followed by an entry number, with subsequent entry numbers delimited by another tilde ~. So a standard value for this field, for an order line that caused 2 entry numbers to be generated, might look like this: PD~12345~67891

The issue is, the formatting for that field is sometimes not consistent, in that there are sometimes trailing tildes, sometimes the PD~ is repeated in middle of the string, other extra garbage characters, etc.

I don't know how to extract the entry numbers from GeneratedEntries to join the two tables, in a way that would work for all formatting possibilities. Is there any way to do this?

Sample tables below with relevant columns (OrderNumber and Product aren't really relevant, just there for context...)

OrderLines Table
+-------------+---------+----------------------+
| OrderNumber | Product |   GeneratedEntries   |
+-------------+---------+----------------------+
|           1 | A       | PD~10005             |
|           1 | B       | PD~10006~            |
|           1 | C       | PD~10007~10008~10009 |
|           2 | R       | PD~10010~~10011      |
|           2 | L       | ~PD~10012~~          |
|           2 | Z       | PD~10013 PD~10014    |
+-------------+---------+----------------------+


Production Table
+-----------------+
| ProductionEntry |
+-----------------+
|           10005 |
|           10006 |
|           10007 |
|           10008 |
|           10009 |
|           10010 |
|           10011 |
|           10012 |
|           10013 |
|           10014 |
+-----------------+

Upvotes: 1

Views: 36

Answers (1)

John Cappelletti
John Cappelletti

Reputation: 81990

Virtually any parse/split function will do. Here is an inline approach.

The only trick is that we replace any SPACE with a ~ and filter with a try_convert()

Example

Declare @YourTable Table ([OrderNumber] varchar(50),[Product] varchar(50),[GeneratedEntries] varchar(50))
Insert Into @YourTable Values 
 (1,'A','PD~10005')
,(1,'B','PD~10006~')
,(1,'C','PD~10007~10008~10009')
,(2,'R','PD~10010~~10011')
,(2,'L','~PD~10012~~')
,(2,'Z','PD~10013 PD~10014')

Select A.OrderNumber
      ,A.Product
      ,B.* 
 From @YourTable  A
 Cross Apply (
                Select RetSeq = Row_Number() over (Order By (Select null))
                      ,RetVal = LTrim(RTrim(B.i.value('(./text())[1]', 'varchar(max)')))
                From  (Select x = Cast('<x>' + replace(replace([GeneratedEntries],' ','~'),'~','</x><x>')+'</x>' as xml).query('.')) as A 
                Cross Apply x.nodes('x') AS B(i)
             ) B
 Where try_convert(int,B.RetVal) is not null

I kept the sequence RetSeq just in case it was important.

Returns

OrderNumber Product RetSeq  RetVal
1           A       2       10005
1           B       2       10006
1           C       2       10007
1           C       3       10008
1           C       4       10009
2           R       2       10010
2           R       4       10011
2           L       3       10012
2           Z       2       10013
2           Z       4       10014

Upvotes: 1

Related Questions