user6798160
user6798160

Reputation: 27

SQL XML parsing split the string on letters 'TH'

I want to split a string into multiple values based on a special symbol. e.g., Here is the string

JdwnrhþTHIMPHUþOTHþþ10/1991þ02/02/2011þBHUTAN

I want it to be:

Jdwnrh THIMPHU OTH 10/1991 02/02/2011 BHUTAN

I am using the following SQL:

DECLARE @delimiter VARCHAR(50)
SET @delimiter='þ' 
;WITH CTE AS
( 
    SELECT
        CAST('<M>' + REPLACE(REPLACE(CAST(DATA as nvarchar(MAX)), @delimiter , '</M><M>'), '&', '&amp;') + '</M>' AS XML) 
        AS BDWCREGPREVADDR_XML
    FROM [JACS_RAVEN_DATA_OLD].dbo.BDWCREGPREVADDR
)

SELECT
     BDWCREGPREVADDR_XML.value('/M[1]', 'varchar(50)') As streetNo,
     BDWCREGPREVADDR_XML.value('/M[2]', 'varchar(50)') As suburb,
     BDWCREGPREVADDR_XML.value('/M[3]', 'varchar(3)') As stateCode,
     BDWCREGPREVADDR_XML.value('/M[4]', 'varchar(10)') As postalCode,
     BDWCREGPREVADDR_XML.value('/M[7]', 'varchar(50)') As country,
     BDWCREGPREVADDR_XML.value('/M[5]', 'varchar(50)') As dateFrom,
     BDWCREGPREVADDR_XML.value('/M[6]', 'varchar(50)') As dateTo
FROM CTE
GO

The query works well on all the strings other than the one provided as an example. For above the string, the query returns the following:

'Jdwnrh' ' ' 'IMPHU' 'O' ' ' '10/1991' '02/02/2011' 'BHUTAN'

It seems the code takes letters 'TH' as a new attribute and split the string on it. Does anyone know how to resolve this issue?

Upvotes: 3

Views: 82

Answers (2)

Angelina
Angelina

Reputation: 1

the key(þ) is wrong ,if you change another word ,it's ok. when use key(þ) and key(z), there are two results: enter image description here enter image description here I think maybe the key(þ) has some special meaning. hope to help you

Upvotes: 0

Blorgbeard
Blorgbeard

Reputation: 103525

This seems to be related to your collation. In Latin1_General_CS_AS, the þ character is considered equivalent to th (because it's an Old English letter that sounds like "th" when pronounced).

print replace('thornþ' collate Latin1_General_CS_AS,'þ','1') 
' output: 1orn1

This is not the case for all collations; for example, in Latin1_General_BIN they are separate:

print replace('thornþ' collate Latin1_General_BIN,'þ','1') 
' output: thorn1

So perhaps you could look at changing the collation of the column which contains the þ characters.

Upvotes: 2

Related Questions