Reputation: 83063

What is wrong with this join of a table to itself?

I have a table called TempAllAddresses with the following columns - ID, Address, State. I want to populate a new table with Address, State, and Count. Count should represent how many records there are in the TempAllAddresses table that have an address like this address followed by a wildcard. If that made no sense, here's an example to illustrate - Let's say I have a record like this:

ID      Address      State
12345   13 Phoenix   NY

What I want to do is insert a new record into a new table called AddressCount that has 13 Phoenix for the Address, NY for the State, and the number of records in the table that have NY as the State and an address LIKE '13 Phoenix%' for the Count.

I want to accomplish this with an inner join of TempAllAddresses on itself. This is what I've tried, but it doesn't seem to accomplish what I'm looking for:

SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
 ON t1.state = t2.state
 AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

The Count is definitely off, though. It should be equivalent to running "SELECT COUNT(*) FROM TempAllAddresses WHERE State=thisRecordsState and Address LIKE thisRecordsAddress + '%'". How can I accomplish this? What am I doing wrong?

Edit:

The count seems to be off in the following way - If I have a record like I mentioned above, and then I have 2 other records that also have a state of NY, and then have addresses of "13 Phoenix Road" and "13 Phoenix Rd", then I want to get in my final table a record like this:

13 Phoenix    NY    3

Instead, I seem to be getting:

13 Phoenix    NY    9

I'm not quite sure what's happening here... some sort of cartesian product? Permutations...? Can anyone explain this?

Edit 2: A further edit since I seem to be misunderstood (and really need a solution :( )... Here is a query with a correlated subselect that accomplishes what I'm looking for. I'd like to do the same thing with an inner join of the table on itself rather than a subselect.

SELECT Address, State, 
    (SELECT Count(*)
    FROM TempAllAddresses innerQry 
    WHERE innerQry.address LIKE outerQry.address + '%' 
        AND innerQry.state = outerQry.state) As NumEntities
FROM TempAllAddresses outerQry

Basically, for each record, I want to get the number of records in the table that have the same state and an address that begins with this address (or is equal to... I do want to include this address as part of the count).

Upvotes: 4

Answers (7)

Jeremy Pridemore

Reputation: 1995

Here's two solutions, one using a CROSS APPLY and the other using an INNER JOIN like you wanted originally. I hope this helps. :)

DECLARE @TempAllAddresses TABLE
(
    ID INT PRIMARY KEY IDENTITY(1, 1) NOT NULL
    , [Address] VARCHAR(250) NOT NULL
    , [State] CHAR(2) NOT NULL
)

INSERT INTO @TempAllAddresses
VALUES ('13 Phoenix', 'NY')
        , ('13 Phoenix St', 'NY')
        , ('13 Phoenix Street', 'NY')
        , ('1845 Test', 'TN')
        , ('1337 Street', 'WA')
        , ('1845 T', 'TN')

SELECT
    TempAddresses.ID
    , TempAddresses.[Address]
    , TempAddresses.[State]
    , TempAddressesCounted.AddressCount
FROM @TempAllAddresses TempAddresses
CROSS APPLY
(
    SELECT
        COUNT(*) AS AddressCount
    FROM @TempAllAddresses TempAddressesApply
    WHERE TempAddressesApply.[Address] LIKE (TempAddresses.[Address] + '%')
        AND TempAddressesApply.[State] = TempAddresses.[State]
) TempAddressesCounted

SELECT
    TempAddresses.ID
    , TempAddresses.[Address]
    , TempAddresses.[State]
    , COUNT(*) AS AddressCount
FROM @TempAllAddresses TempAddresses
INNER JOIN @TempAllAddresses TempAddressesJoin
    ON TempAddressesJoin.[Address] LIKE (TempAddresses.[Address] + '%')
            AND TempAddressesJoin.[State] = TempAddresses.[State]
GROUP BY TempAddresses.ID
    , TempAddresses.[Address]
    , TempAddresses.[State]

Upvotes: 1

RapidCoder

Reputation: 73

Have you tried analytical functions - they are often the easiest solution. I am not familiar with your table structure, but it should be something like this:

SELECT t1.Address, t1.State, 
COUNT(t2.address) OVER (PARTITION BY t2.state) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
ON t1.state = t2.state
AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

You can even add ORDER BY in the OVER clause. See Oracle FAQs for some explanation.

Upvotes: 0

kevpie

Reputation: 26098

Nested GroupBy:

The subquery will find the shortest address for each distinct address.
This doesn't consider case sensitivity.
Then each version of these addresses are counted.

SQL:

SELECT Address, State, count(1) As NumEntities
FROM ( 
    SELECT min(t1.Address) as Address, t1.State
    FROM TempAllAddresses t1
    INNER JOIN TempAllAddresses t2
     ON t1.state = t2.state
     AND T2.Address LIKE t1.address + '%'
    GROUP BY t1.State, t2.Address
) GROUP By State, Address

Upvotes: 0

user533832

Reputation:

There is double counting going on when there are multiple rows with exactly the same address.

Try:

SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM (select distinct Address, State from TempAllAddresses) t1
INNER JOIN TempAllAddresses t2
 ON t1.state = t2.state
 AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

Upvotes: 0

lijie

Reputation: 4871

QUERY A:

SELECT t1.Address, t1.State, COUNT(t2.address) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
 ON t1.state = t2.state
 AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

is not equivalent to

QUERY B:

SELECT Address, State, 
    (SELECT Count(*)
    FROM TempAllAddresses innerQry 
    WHERE innerQry.address LIKE outerQry.address + '%' 
        AND innerQry.state = outerQry.state) As NumEntities
FROM TempAllAddresses outerQry

because B produces 1 row for each row in the original table (TempAllAddresses), whereas A will group together rows in the original table that have the same state and address. To solve this, GROUP BY t1.ID, t1.State, t1.Address instead.

Upvotes: 0

Spiny Norman

Reputation: 8327

EDIT: [snip old stuff]

Try this:

SELECT t1.Address, t1.State, COUNT(distinct t2.id) As NumEntities
FROM TempAllAddresses t1
INNER JOIN TempAllAddresses t2
 ON t1.state = t2.state
 AND T2.Address LIKE t1.address + '%'
GROUP BY t1.State, t1.Address

Upvotes: 0

Guillem Vicens

Reputation: 3996

Try this instead:

SELECT Orig_Address, State, COUNT(Similar_Address)
From
(
  SELECT t1.Address Orig_Address, 
         t1.State   State, 
         t2.address Similar_Address
    FROM TempAllAddresses t1
   INNER JOIN TempAllAddresses t2
      ON t1.state = t2.state
     AND T2.Address LIKE t1.address + '%'
     AND t1.address <> t2.address
)
GROUP BY State, Orig_Address

EDIT: forgot to include the difference between t1.address and t2.address, as @Spiny Norman said, since you probably do not want to compare an address to itself.

HTH

Upvotes: 0

What is wrong with this join of a table to itself?

Answers (7)

Related Questions