Reputation: 31
I am trying to select only rows from the INPUT table which do not have a child - in SQL.The result is in OUTPUT. To make it easier, for now, I am considering only the columns 5H and 6H (the other columns I will do later after I have figured how the code needs to be written). The first part of the code before UNION - inserts only rows which have all columns - 1H thru 6H. Then the second part, after UNION, should make sure that only the columns where 5H is not NULL and 6H is null and which do not have a child in 6H, will be included. But it does not work. (NAME5 appears in the OUTPUT - but NAME5 is a parent for the child NAME6).
The embedded JOIN has this: WHERE v1.[5H] <> v2.[5H] -- to make sure that the row which is a parent will be excluded (i.e. v1.[5H] already exists in OUTPUT for a child row). But something goes wrong. Any idea why? Thank you for your help.
I am working in Excel - vba. I was having problems to copy the code in a readable format - hence I am enclosing the code as a picture as well
What I do not understand, why the issue is with: NAME5 is in output and it should not be there, because its child NAME6 is in OUTPUT. But here it works: the child NAME17 is in OUTPUT but its parent NAME16 is not in the OUTPUT (which is correct). Or NAME2 and NAME3 ... it is correct
the code is:
SQLQuery = _
"SELECT v1.[name], v1.[1H], v1.[2H], v1.[3H], v1.[4H], v1.[5H], v1.[6H]" & _
" FROM [INPUT$] as v1 WHERE v1.[2H] IS NOT NULL and " & _
" v1.[3H] IS NOT NULL AND v1.[4H] IS NOT NULL AND v1.[5H] IS NOT NULL AND v1.[6H] IS NOT NULL " & _
"UNION " & _
"SELECT DISTINCT v1.[name], v1.[1H], v1.[2H], v1.[3H], v1.[4H], v1.[5H], v1.[6H]" & _
" FROM [INPUT$] as v1 " & _
" INNER JOIN " & _
" (SELECT * " & _
" FROM [INPUT$] WHERE [5H] IS NOT NULL AND [6H] IS NULL ) as v2 " & _
" ON v1.[1H] = v2.[1H] and v1.[2H] = v2.[2H] and v1.[3H] = v2.[3H] and v1.[4H] = v2.[4H] " & _
" WHERE v1.[5H] <> v2.[5H] "
Upvotes: 2
Views: 91
Reputation: 5493
Self JOIN source data to find childs for every row.
If row do not have child - this is target row.
There t2 (right side) row is child of t1 (left side) row, if
t1.cname<>t2.cname -- a row cannot be a parent/child of itself
and (t1.c1H=t2.c1H or t1.c1H is null)
and (t1.c2H=t2.c2H or t1.c2H is null)
and (t1.c3H=t2.c3H or t1.c3H is null)
and (t1.c4H=t2.c4H or t1.c4H is null)
and (t1.c5H=t2.c5H or t1.c5H is null)
and (t1.c6H=t2.c6H or t1.c6H is null)
There
value of child is equal to parent or parent is null
for example:
(1,1) is child of (1,null)
(1,1,2) is child of (1,1,null)
(1,1,2) is not child of (1,1,1)
When obviously (1,1,1,1,1,1) is child of (1,1,1,1,1,null)
Note:
(1,1,1,1,1,1) is child of (1,1,1,1,1,1) - we cannot distinguish who is the parent, who is the child.
(1,1,1,1,1,null) is child of (1,1,1,1,1,null)
If you have those cases check additionally.
See example
cname | c1H | c2H | c3H | c4H | c5H | c6H | comment |
---|---|---|---|---|---|---|---|
name1 | 1 | 1 | 1 | 1 | null | null | no |
name2 | 1 | 1 | 1 | 1 | 1 | null | no |
name3 | 1 | 1 | 1 | 1 | 1 | 1 | yes |
name4 | 1 | 1 | 1 | 2 | null | null | no |
name5 | 1 | 1 | 1 | 2 | 1 | null | no |
name6 | 1 | 1 | 1 | 2 | 1 | 1 | yes |
name19 | 1 | 1 | 1 | 2 | 2 | null | yes |
name20 | 1 | 1 | 1 | 2 | 3 | null | yes |
name21 | 1 | 1 | 1 | 2 | 4 | null | yes |
name22 | 1 | 1 | 1 | 3 | null | null | no/yes? |
name7 | 1 | 1 | 1 | null | null | null | no |
name8 | 1 | 1 | 2 | null | null | null | no |
name9 | 1 | 1 | 2 | 1 | null | null | no |
name10 | 1 | 1 | 2 | 1 | 1 | null | no |
name11 | 1 | 1 | 2 | 1 | 1 | 1 | yes |
name12 | 1 | 1 | null | null | null | null | no |
name13 | 1 | 2 | null | null | null | null | no |
name14 | 1 | 2 | 1 | null | null | null | no |
name15 | 1 | 2 | 1 | 1 | null | null | no |
name16 | 1 | 2 | 1 | 1 | 1 | null | no |
name17 | 1 | 2 | 1 | 1 | 1 | 1 | yes |
name18 | 1 | null | null | null | null | null | no |
Query
select *
from test t1
left join test t2
on t1.cname<>t2.cname
and (t1.c1H=t2.c1H or t1.c1H is null)
and (t1.c2H=t2.c2H or t1.c2H is null)
and (t1.c3H=t2.c3H or t1.c3H is null)
and (t1.c4H=t2.c4H or t1.c4H is null)
and (t1.c5H=t2.c5H or t1.c5H is null)
and (t1.c6H=t2.c6H or t1.c6H is null)
where t2.cname is null -- no childs
Output
cname | c1H | c2H | c3H | c4H | c5H | c6H | comment |
---|---|---|---|---|---|---|---|
name3 | 1 | 1 | 1 | 1 | 1 | 1 | yes |
name6 | 1 | 1 | 1 | 2 | 1 | 1 | yes |
name19 | 1 | 1 | 1 | 2 | 2 | null | yes |
name20 | 1 | 1 | 1 | 2 | 3 | null | yes |
name21 | 1 | 1 | 1 | 2 | 4 | null | yes |
name22 | 1 | 1 | 1 | 3 | null | null | no/yes? |
name11 | 1 | 1 | 2 | 1 | 1 | 1 | yes |
name17 | 1 | 2 | 1 | 1 | 1 | 1 | yes |
For 'name22' in your data, I think this is target row.
Update1. This is solution for all columns, not only 5 and 6.
Upvotes: 0
Reputation: 3187
The record with name=Name5 is included because the 2nd query (after the union) allows it.
...
UNION
"SELECT DISTINCT v1.[name], v1.[1H], v1.[2H], v1.[3H], v1.[4H], v1.[5H], v1.[6H]" & _
" FROM [INPUT$] as v1 " & _
" INNER JOIN " & _
" (SELECT * " & _
" FROM [INPUT$] WHERE [5H] IS NOT NULL AND [6H] IS NULL ) as v2 " & _
" ON v1.[1H] = v2.[1H] and v1.[2H] = v2.[2H] and v1.[3H] = v2.[3H] and v1.[4H] = v2.[4H] " & _
" WHERE v1.[5H] <> v2.[5H] "
If you examine the values from v2.5H you can see 3 values that satisfy the join criteria
ON v1.[1H] = v2.[1H] and v1.[2H] = v2.[2H] and v1.[3H] = v2.[3H] and v1.[4H] = v2.[4H]
and WHERE clause.
WHERE v1.[5H] <> v2.[5H]
name | 1H | 2H | 3H | 4H | 5H | v2.5H | 6H |
---|---|---|---|---|---|---|---|
NAME19 | 1 | 1 | 1 | 2 | 2 | 1,3,4 | null |
NAME20 | 1 | 1 | 1 | 2 | 3 | 1,2,4 | null |
NAME21 | 1 | 1 | 1 | 2 | 4 | 1,2 | null |
NAME5 | 1 | 1 | 1 | 2 | 1 | 2,3,4 | null |
NAME6 | 1 | 1 | 1 | 2 | 1 | 2,3,4 | 1 |
Upvotes: 0