user1762476
user1762476

Reputation: 67

Find the duplicates in the Xml column

I have a table with the following sample columns with data as given below. Each account number has a corresponding createddt and xmlpayload column values.

In the given 4 rows, I need to find the valid duplicates. Each valid account number has 2 xml rows (one statement and one pdf xml record). If the account number has 2 records with same type of xml root node like pdf record (2335577) it should be a duplicate. How to filter the exact duplicates by querying the xml column. Please help.

Records:  
    AccountNo   **Xmlpayload**  
    2335566     <ns0:Statement xmlns:ns0="uri"><empid>123<empid/><ns0:Statement>
    2335566     <ns0:PDFStmt xmlns:ns0="uri"><empid>123<empid/></ns0:PDFStmt> 
    2335577     <ns0:PDFStmt xmlns:ns0="uri"><empid>123<empid/></ns0:PDFStmt>       
    2335577     <ns0:PDFStmt xmlns:ns0="uri"><empid>123<empid/></ns0:PDFStmt>      
    4332355     <ns0:PDFStmt xmlns:ns0="uri"><empid>123<empid/></ns0:PDFStmt>       
    4332355     <ns0:Statement xmlns:ns0="uri"><empid>123<empid/></ns0:Statement>      
    6723588     <ns0:Statement xmlns:ns0="uri"><empid>123<empid/></ns0:Statement>       
    6723588     <ns0:Statement xmlns:ns0="uri"><empid>123<empid/></ns0:Statement>   

My Expected Output:  
    2335577     <ns0:PDFStmt xmlns:ns0="uri"><empid>123<empid/></ns0:PDFStmt>       
    2335577     <ns0:PDFStmt xmlns:ns0="uri"><empid>123<empid/></ns0:PDFStmt>      
    6723588     <ns0:Statement xmlns:ns0="uri"><empid>123<empid/></ns0:Statement>       
    6723588     <ns0:Statement xmlns:ns0="uri"><empid>123<empid/></ns0:Statement>   

I am using SQL 2008 R2 version.

Upvotes: 1

Views: 1802

Answers (2)

roman
roman

Reputation: 117345

For SQL Server you can use exist() method from sqlxml, like this:

with xmlnamespaces ('uri' as ns0)
select *
from Table1 as t
where t.Xmlpayload.exist('ns0:PDF[2]') = 1

or you can use xquery count() function with value() method:

with xmlnamespaces ('uri' as ns0)
select *
from Table1 as t
where t.Xmlpayload.value('count(ns0:PDF)', 'int') > 1;

sql fiddle demo

update

If you have one for each AccountNo, you can use this query:

with xmlnamespaces ('uri' as ns0)
select *
from Table2 as t
where t.Xmlpayload.exist('ns0:PDFStmt[2]') = 1

If you have several rows for each AccountNo (and want to get Xmlpayload in the resultset):

with
xmlnamespaces ('uri' as ns0),
cte as (
    select
        *, count(*) over(partition by AccountNo) as cnt
    from Table1 as t
    where t.Xmlpayload.exist('ns0:PDFStmt') = 1
)
select *
from cte
where cnt > 1;

sql fiddle demo

Upvotes: 1

Gatej Alexandru
Gatej Alexandru

Reputation: 92

I'm not 100% what you want to count. What I know for sure is that you can count nodes in XML file in MSSQL versions like this:

declare @xml xml
select @xml= convert(xml,N'<ns0 xmlns:ns0="uri"><empid>123</empid><age>23</age></ns0>
<ns0 xmlns:ns0="uri"><empid>123</empid><age>32</age></ns0>    
<ns0 xmlns:ns0="uri"><empid>123</empid><age>23</age></ns0>
<ns0 xmlns:ns0="uri"><empid>123</empid><age>32</age></ns0>')

select count(*) as nr
from @xml.nodes('root/ns0/empid') as S(N)

Be aware that I modified also your xml because wasn't valid. i transformed empid>123 empid/> in empid>123 /empid>

, same for age

I removed :Statement and :PDF. This part is valid but i don't know the sysntax for searching throw nodes like this, but is working also for them.

Upvotes: 0

Related Questions