Reputation: 599
I'm working with SQL Server 2012 and have the following tables : Ownership
, Property
, Person
The table Person
holds information about people such as first name, last name and this table has PersonId
as primary key.
The table Property
holds information about property such as property area, property description .. and this table has PropertyId
as primary key
Because each person can have more than one property, and each ownership of a property can be more than one person, then we have a many-to-many relationship between Person
and Property
So I created the table Ownership
to break this relationship, so this table has PersonId
and PropertyId
as foreign keys, and the following columns : PropertyId
as "Primary key", StartDate
, EndDate
and OwnershipPercent
.
Start Date
and End Date
refer to the period in which the property is owned by someone, and the OwnershipPercent
refers to the person's share of the property.
Now I would to write a query to return any property that is owned by more than one person by more than 100% in the same time
For example:
Property with Id=1
belongs to person with #1 from 1-1-2010 to 1-1-2012 and his share of this property is 90%, and this property also belongs to another person with #2 from 1-1-2010 to 1-1-2012 and his share of this property is 80%.. as we see if we sum 90+80=170% at same time and this is wrong (because it would be less than 100% at same time)
I wrote the following query:
SELECT A.PropertyId
FROM Ownership A INNER JOIN Ownership B
ON a.PersonId <> b.PersonId
AND A.PropertyId = B.PropertyId
AND A.StartDate <= B.EndDate
AND A.EndDate >= B.StartDate
group by A.PropertyId
Having (sum(A.OwnershipPercent)) <=100;
but if we have a property that belongs to 5 people, it makes (5×4)=20 sums and this is wrong
How to fix this ?
Upvotes: 4
Views: 4525
Reputation: 77657
The following is similar to @Gordon Linoff's suggestion in that it also "decomposes" the list of ranges into the list of start and end dates. However, it uses a different technique on the resulting list. It also assumes that only the beginning date is inclusive, while the ending date is not.
WITH unpivoted AS (
SELECT
PropertyId,
EventDate,
OwnershipPercent,
PercentFactor = CASE EventDateType WHEN 'EndDate' THEN -1 ELSE 1 END
FROM Ownership
UNPIVOT (
EventDate FOR EventDateType IN (StartDate, EndDate)
) u
)
, summedup AS (
SELECT DISTINCT
PropertyId,
EventDate,
TotalPercent = SUM(OwnershipPercent * PercentFactor)
OVER (PARTITION BY PropertyId ORDER BY EventDate)
FROM unpivoted
)
SELECT
s.EventDate,
s.TotalPercent,
o.PropertyId,
o.PersonId,
o.StartDate,
o.EndDate,
o.OwnershipPercent
FROM summedup s
INNER JOIN Ownership o
ON s.PropertyId = o.PropertyId
AND s.EventDate >= o.StartDate
AND s.EventDate < o.EndDate
WHERE TotalPercent > 100 -- changed from the original "<= 100"
-- based on the verbal description
;
To explain how this works, I'll assume the contents of Ownership
to be the following:
PropertyId PersonId StartDate EndDate OwnershipPercent
---------- -------- ---------- ---------- ----------------
1 1 2010-01-01 2012-01-01 80
1 2 2011-01-01 2011-03-01 20
1 3 2011-02-01 2011-04-01 10
1 4 2011-05-01 2011-07-01 40
Now, you can see that at the first step, unpivoting, not merely every row of the original table is replaced with two rows, but also every percent value is marked as either an increment (PercentFactor = 1
) or a decrement (PercentFactor = -1
), depending on whether it comes along with the starting date or with the ending date. So, the unpivoted
CTE evaluates to the following result set:
PropertyId EventDate OwnershipPercent PercentFactor
---------- ---------- ---------------- -------------
1 2010-01-01 80 1
1 2011-01-01 20 1
1 2011-02-01 10 1
1 2011-03-01 20 -1
1 2011-04-01 10 -1
1 2011-05-01 40 1
1 2011-07-01 40 -1
1 2012-01-01 80 -1
At this point, the idea is basically to calculate running totals of OwnershipPercent
at every EventDate
for every PropertyId
, taking into account whether the value is incremented or decremented. (You could, in fact, incorporate the sign into OwnershipPercent
at the first stage instead of allocating a separate column of PercentFactor
. I chose the latter as somewhat better illustrating the idea, but there should be no performance penalties if you preferred the former.) And this is what you get after calculating the running totals (which is what the second CTE, summedup
, does):
PropertyId EventDate TotalPercent
---------- ---------- ------------
1 2010-01-01 80
1 2011-01-01 100
1 2011-02-01 110
1 2011-03-01 90
1 2011-04-01 80
1 2011-05-01 120
1 2011-07-01 80
1 2012-01-01 0
Note, however, that this resulting set may contain duplicate rows. Particularly, it will if, for the same PropertyId
, some ranges start or end simultaneously or some range ends exactly at the start date of another range. That's why you can see DISTINCT used at this stage.
Now that total percent values at key dates are known, those not exceeding 100 can just be filtered out and the rest joined back to Ownership
for the access to the details of the ownerships contributing to the obtained totals. So, the main query gives you this as the final result:
EventDate TotalPercent PropertyId PersonId StartDate EndDate OwnershipPercent
---------- ------------ ---------- -------- ---------- ---------- ----------------
2011-02-01 110 1 1 2010-01-01 2012-01-01 80
2011-02-01 110 1 2 2011-01-01 2011-03-01 20
2011-02-01 110 1 3 2011-02-01 2011-04-01 10
2011-05-01 120 1 1 2010-01-01 2012-01-01 80
2011-05-01 120 1 4 2011-05-01 2011-07-01 40
You can also have a look at (as well as play with) this query at SQL Fiddle.
Upvotes: 0
Reputation: 16894
This request is probably necessary to you
SELECT PropertyID,
FROM dbo.Ownership
GROUP BY PropertyID, StartDate, EndDate
HAVING COUNT(PersonID) > 1
AND SUM(OwnershipPercent) <= 100 --in your question you want > 100
Upvotes: 0
Reputation: 1269483
I think the approach of a join on the ownership table is not quite right. I see what you are trying to do, but the join is creating pairs of owners. You want to think about sets of owners, instead.
My approach is to create a table with all the important dates for each property. This would be the StartDate and EndDate in the OwnerShip table. Then, let's look at the ownership percentages on these dates:
select os.PropertyId, thedate, SUM(os.OwnershipPercent)
from ((select PropertyId, StartDate as thedate
from ownership
)union
(select PropertyId, EndDate
from ownership
)
) driver join
OwnerShip os
on driver.PropertyId = os.PropertyId and
driver.thedate between os.StartDate and os.EndDate
group by os.PropertyId, thedate
having SUM(os.OwnershipPercent) <= 100 -- Do you really want > 100 here?
One key difference is that this query is aggregating on PropertyId and the date. Tis makes sense because the amount of ownership can change over time.
Upvotes: 2
Reputation: 10063
DISTINCT
will do right,
SELECT A.PropertyId
FROM Ownership A INNER JOIN Ownership B
ON a.PersonId <> b.PersonId
AND A.PropertyId = B.PropertyId
AND A.StartDate <= B.EndDate
AND A.EndDate >= B.StartDate
group by A.PropertyId
Having (sum(distinct A.OwnershipPercent)) <=100;
Upvotes: 1