Retrieving Common rows from two tables having Duplicate records

Its my first question, so don't know where I might go wrong. Here is my scenario : I have two tables : tblA and tblB. A procedure that first fills some data in tblA and then the same data in tblB. Now due to certain glitch in procedure query, some values of certain columns were not inserted in tblB but in tblA. I now want to update tblB with the records in tblA. I can't find the corresponding records for update. For easing the task I have given the sample ready to use query

--DROP TABLE #tblA
--DROP TABLE #tblB
--TRUNCATE TABLE #tblA
--TRUNCATE TABLE #tblB
CREATE TABLE #tblA (col1 char(2), col2 char(2), col3 int, col4 char)
CREATE TABLE #tblB (ID char(2),col1 char(2), col2 char(2), col3 int, col4 char)

INSERT INTO #tblA VALUES ('A','B',1,'C')
INSERT INTO #tblA VALUES ('A1','B',2,'C')
INSERT INTO #tblA VALUES ('A1','B',3,'C')

INSERT INTO #tblB VALUES ('I1','A','B',NULL,'C') -- Here we can see that values could not be inserted in col3
INSERT INTO #tblB VALUES ('I2','A1','B',NULL,'C')
INSERT INTO #tblB VALUES ('I3','A1','B',3,'C')

What I have tried so far (excluding other queries that I tried, this gives me the closest result) (i've even tried the commented section)

--UPDATE B
--SET B.col3=A.col3
SELECT  DISTINCT A.*, B.*
FROM #tblA A
INNER JOIN #tblB B
ON 1=1
AND A.col1=B.col1
AND A.col2 = B.col2
AND A.col4 = B.col4
--AND (A.col3<>B.col3)
AND (B.col3 IS NULL OR (B.col3 IS NOT NULL AND A.col3=B.col3))
--AND B.col3 IS NULL

The result I get is as followed:

col1    col2    col3    col4    ID  col1    col2    col3    col4
A        B       1       C      I1   A       B      NULL    C
A1       B       2       C      I2   A1      B      NULL    C
A1       B       3       C      I2   A1      B      NULL    C
A1       B       3       C      I3   A1      B      3       C

What modification should I do in my select query (which is going to be my update query) so that I get result as followed : (result that removes third row)

col1    col2    col3    col4    ID  col1    col2    col3    col4
A        B       1       C      I1   A       B      NULL    C
A1       B       2       C      I2   A1      B      NULL    C  
A1       B       3       C      I3   A1      B      3       C

Upvotes: 1

Answers (2)

Beartums

Reputation: 1350

Unfortunately, based on the information you have given, I don't think there is a good generalized solution. I'd need more information about the possible patterns. As asked, the obvious answer is:

SELECT *
FROM #tblA a
inner join #tblb b
on a.col1=b.col1
and a.col2 = b.col2
and a.col4 = b.col4
and a.col3 = right(b.ID,1)

alternatively, the last line could be and a.col3 = coalesce(b.col3,right(b.ID,1))

Which depends on the ID in #tblb being reflected in Col3. I don't think that's what you want. From an algorithmic point of view, if you can't depend on a meaningful ID, I'm not sure there is a good way to do this. If you can describe the general angorithm you want to apply, then we can write the code to match it.

For example, something that might be meaningful is:

Tbla joins to tblb on all columns
When the join fails, I want to pick the first record from tblb that:
- matches columns 1, 2, and 4 in table a
- has not already been joined to tbla when sorted by ID

In certain situations I could see that algorithm being useful. But in essence, your are joining tables with duplicate keys, which gets you a cross join on the records where the key is duplicated.

Hope this helps. If you clarify a bit more what you are trying to do, I might be able to help a bit more.

UPDATE:

If, as you say in your comment, you simply want to make sure the two tables match exactly, then you can use the following code:

CREATE TABLE #tblA (col1 char(2), col2 char(2), col3 int, col4 char)
CREATE TABLE #tblB (col1 char(2), col2 char(2), col3 int, col4 char)

INSERT INTO #tblA VALUES ('A','B',1,'C')
INSERT INTO #tblA VALUES ('A1','B',2,'C')
INSERT INTO #tblA VALUES ('A1','B',3,'C')

INSERT INTO #tblB VALUES ('A','B',NULL,'C') 
INSERT INTO #tblB VALUES ('A1','B',NULL,'C')
INSERT INTO #tblB VALUES ('A1','B',3,'C')

delete from #tblb 
where NOT Exists (Select * from #tbla  a 
    where a.col1 = #tblb.col1
    and a.col2 = #tblb.col2
    and a.col3 = #tblb.col3
    and a.col4 = #tblb.col4)

INSERT #tblb
SELECT * from #tbla
where not exists (select * from #tblb b
    where b.col1 = #tbla.col1
    and b.col2 = #tbla.col2
    and b.col3 = #tbla.col3
    and b.col4 = #tbla.col4)

select * from #tbla a
inner join #tblb b
on b.col1 = a.col1
    and b.col2 = a.col2
    and b.col3 = a.col3
    and b.col4 = a.col4

Of course, depending on the size of the table, the size of the row, the indexing of both tables, and how often you run it, it might make more sense to simply drop and reinsert.

It is possible to only update the columns that are incorrect, but I think the coding and maintenance overhead would no be worth it, not to mention that performance would probably be pretty bad.

Does this help?

Upvotes: 1

Ajay2707

Reputation: 5798

you don't want to use inner join as there are multiple values comes, you want TOP 1 or Top First record's Col3 value.

So I update the update query as below and this update the result as you want.

update  #tblB 
SET col3= 
( select top 1 a.col3 from #tblA a where A.col1=#tblB.col1
AND A.col2 = #tblB.col2
AND A.col4 = #tblB.col4)

select * from #tblB

Upvotes: 0

Retrieving Common rows from two tables having Duplicate records

Answers (2)

Related Questions