Mohamad Ghanem
Mohamad Ghanem

Reputation: 599

Join tables In SQL with SUM function

I'm working with SQL Server 2012 and have the following tables : Ownership, Property, Person

The table Person holds information about people such as first name, last name and this table has PersonId as primary key.

The table Property holds information about property such as property area, property description .. and this table has PropertyId as primary key

Because each person can have more than one property, and each ownership of a property can be more than one person, then we have a many-to-many relationship between Person and Property

So I created the table Ownership to break this relationship, so this table has PersonId and PropertyId as foreign keys, and the following columns : PropertyId as "Primary key", StartDate, EndDate and OwnershipPercent.

Start Date and End Date refer to the period in which the property is owned by someone, and the OwnershipPercent refers to the person's share of the property.

Now I would to write a query to return any property that is owned by more than one person by more than 100% in the same time

For example:

Property with Id=1 belongs to person with #1 from 1-1-2010 to 1-1-2012 and his share of this property is 90%, and this property also belongs to another person with #2 from 1-1-2010 to 1-1-2012 and his share of this property is 80%.. as we see if we sum 90+80=170% at same time and this is wrong (because it would be less than 100% at same time)

I wrote the following query:

SELECT A.PropertyId
FROM Ownership A INNER JOIN Ownership B
ON a.PersonId <> b.PersonId
AND A.PropertyId = B.PropertyId
AND A.StartDate <= B.EndDate
AND A.EndDate >= B.StartDate
group by A.PropertyId
Having (sum(A.OwnershipPercent)) <=100; 

but if we have a property that belongs to 5 people, it makes (5×4)=20 sums and this is wrong

How to fix this ?

Upvotes: 4

Views: 4525

Answers (4)

Andriy M
Andriy M

Reputation: 77657

The following is similar to @Gordon Linoff's suggestion in that it also "decomposes" the list of ranges into the list of start and end dates. However, it uses a different technique on the resulting list. It also assumes that only the beginning date is inclusive, while the ending date is not.

WITH unpivoted AS (
  SELECT
    PropertyId,
    EventDate,
    OwnershipPercent,
    PercentFactor = CASE EventDateType WHEN 'EndDate' THEN -1 ELSE 1 END
  FROM Ownership
  UNPIVOT (
    EventDate FOR EventDateType IN (StartDate, EndDate)
  ) u
)
, summedup AS (
  SELECT DISTINCT
    PropertyId,
    EventDate,
    TotalPercent = SUM(OwnershipPercent * PercentFactor)
                   OVER (PARTITION BY PropertyId ORDER BY EventDate)
  FROM unpivoted
)
SELECT
  s.EventDate,
  s.TotalPercent,
  o.PropertyId,
  o.PersonId,
  o.StartDate,
  o.EndDate,
  o.OwnershipPercent
FROM summedup s
  INNER JOIN Ownership o
     ON s.PropertyId = o.PropertyId
    AND s.EventDate >= o.StartDate
    AND s.EventDate <  o.EndDate
WHERE TotalPercent > 100  -- changed from the original "<= 100"
                          -- based on the verbal description
;

To explain how this works, I'll assume the contents of Ownership to be the following:

PropertyId PersonId StartDate  EndDate    OwnershipPercent
---------- -------- ---------- ---------- ----------------
1          1        2010-01-01 2012-01-01 80
1          2        2011-01-01 2011-03-01 20
1          3        2011-02-01 2011-04-01 10
1          4        2011-05-01 2011-07-01 40

Now, you can see that at the first step, unpivoting, not merely every row of the original table is replaced with two rows, but also every percent value is marked as either an increment (PercentFactor = 1) or a decrement (PercentFactor = -1), depending on whether it comes along with the starting date or with the ending date. So, the unpivoted CTE evaluates to the following result set:

PropertyId EventDate  OwnershipPercent PercentFactor
---------- ---------- ---------------- -------------
1          2010-01-01 80               1
1          2011-01-01 20               1
1          2011-02-01 10               1
1          2011-03-01 20               -1
1          2011-04-01 10               -1
1          2011-05-01 40               1
1          2011-07-01 40               -1
1          2012-01-01 80               -1

At this point, the idea is basically to calculate running totals of OwnershipPercent at every EventDate for every PropertyId, taking into account whether the value is incremented or decremented. (You could, in fact, incorporate the sign into OwnershipPercent at the first stage instead of allocating a separate column of PercentFactor. I chose the latter as somewhat better illustrating the idea, but there should be no performance penalties if you preferred the former.) And this is what you get after calculating the running totals (which is what the second CTE, summedup, does):

PropertyId EventDate  TotalPercent
---------- ---------- ------------
1          2010-01-01 80
1          2011-01-01 100
1          2011-02-01 110
1          2011-03-01 90
1          2011-04-01 80
1          2011-05-01 120
1          2011-07-01 80
1          2012-01-01 0

Note, however, that this resulting set may contain duplicate rows. Particularly, it will if, for the same PropertyId, some ranges start or end simultaneously or some range ends exactly at the start date of another range. That's why you can see DISTINCT used at this stage.

Now that total percent values at key dates are known, those not exceeding 100 can just be filtered out and the rest joined back to Ownership for the access to the details of the ownerships contributing to the obtained totals. So, the main query gives you this as the final result:

EventDate  TotalPercent PropertyId PersonId StartDate  EndDate    OwnershipPercent
---------- ------------ ---------- -------- ---------- ---------- ----------------
2011-02-01 110          1          1        2010-01-01 2012-01-01 80
2011-02-01 110          1          2        2011-01-01 2011-03-01 20
2011-02-01 110          1          3        2011-02-01 2011-04-01 10
2011-05-01 120          1          1        2010-01-01 2012-01-01 80
2011-05-01 120          1          4        2011-05-01 2011-07-01 40

You can also have a look at (as well as play with) this query at SQL Fiddle.

Upvotes: 0

Oleksandr Fedorenko
Oleksandr Fedorenko

Reputation: 16894

This request is probably necessary to you

SELECT PropertyID,
FROM dbo.Ownership
GROUP BY PropertyID, StartDate, EndDate
HAVING COUNT(PersonID) > 1 
  AND SUM(OwnershipPercent) <= 100 --in your question you want > 100

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1269483

I think the approach of a join on the ownership table is not quite right. I see what you are trying to do, but the join is creating pairs of owners. You want to think about sets of owners, instead.

My approach is to create a table with all the important dates for each property. This would be the StartDate and EndDate in the OwnerShip table. Then, let's look at the ownership percentages on these dates:

select os.PropertyId, thedate, SUM(os.OwnershipPercent)
from ((select PropertyId, StartDate as thedate
       from ownership
      )union
      (select PropertyId, EndDate
       from ownership
      )
     ) driver join
     OwnerShip os
     on driver.PropertyId = os.PropertyId and
        driver.thedate between os.StartDate and os.EndDate
group by os.PropertyId, thedate
having SUM(os.OwnershipPercent) <= 100  -- Do you really want > 100 here?

One key difference is that this query is aggregating on PropertyId and the date. Tis makes sense because the amount of ownership can change over time.

Upvotes: 2

Mariappan Subramanian
Mariappan Subramanian

Reputation: 10063

DISTINCT will do right,

SELECT A.PropertyId
FROM Ownership A INNER JOIN Ownership B
ON a.PersonId <> b.PersonId
AND A.PropertyId = B.PropertyId
AND A.StartDate <= B.EndDate
AND A.EndDate >= B.StartDate
group by A.PropertyId
Having (sum(distinct A.OwnershipPercent)) <=100; 

Upvotes: 1

Related Questions