Reputation: 23
The Mistake
Originally, there was a one-one relationship between Orgs and Servers where the key of Server was simply an OrganizationId, well this was a pretty bad design as business logic changed and now multiple Orgs can have the same server. Before we made the changes, we just duplicated servers for each org, so multiple Orgs would have Servers with the same Subdomains. Below is the current setup.
Requirements
First off, this is unfortunately on prod with a lot of data, so deleting the whole database with the correct model is kind of off the table.
What we would like to do is now remove duplicate Servers on distinct Subdomains, for example if Org1 and Org2 had Ser1 and Ser2 both with the subdomain "test", we would make the FK Org.Server_Id be the lowest occurrence of a server with that domain, in this case Ser1, so that for both Org1 and Org2 their servers would be Ser1. Below is a high tech excel example:
Things we have tried
We were able to get as far as getting Org.Server_Id to be the correct value based on Server.OrganizationId via:
UPDATE Organization
SET Server_Id = t.Id
FROM(
SELECT Id, OrganizationId
FROM Server
) t
WHERE t.OrganizationId = Organization.Id
but whenever we try and go further, we get stuck because we cant use ORDER BY in the inner FROM to try and grab the first occurrence in some aggregate way.
This is finally what we got to, but of course it doesn't work because we cant access t inside the inner from, and I also don't think this is even the correct path to be following:
UPDATE Organization SET Server_Id = t.Id
FROM
(
SELECT Id, OrganizationId
FROM (
SELECT TOP(1) Subdomain, Id, OrganizationId
FROM Server
WHERE Subdomain = t.Subdomain
) a
) t
WHERE t.OrganizationId = Organization.Id
Upvotes: 0
Views: 42
Reputation: 23
After using the partition suggested by @user2731076, we were able to modify our query to this:
UPDATE Organization SET Server_Id = t.Id
FROM
(
SELECT ROW_NUMBER() OVER (PARTITION BY Subdomain ORDER BY [Server].Id ASC) As row_num, [Server].Id, OrganizationId, Subdomain
FROM Organization
INNER JOIN [Server] ON [Server].OrganizationId = [Organization].Id
) t
WHERE t.Subdomain IN(SELECT Subdomain FROM Server WHERE OrganizationId = Organization.Id) AND row_num = 1
The issue we had with our code:
t.OrganizationId = Organization.Id
Was that since there was always a Server associated with an Org, it would just set the value of Org.Server_Id to what it was already set to. So what we wanted to find the first instance of is the row_num = 1 of the Server that had a subdomain similar to the subdomain of the current Org's server. This required the Inner Join to grab it from the partition, and to grab it from the current org via the IN statement in the WHERE clause, so we could do t.Subdomain = subdomain for our Org.
There is probably a more efficient way to do this, and we will look into it in the future.
Upvotes: 0
Reputation: 813
I can't say I completely understand everything you have going on, but in the past when I have needed to grab the first entry of duplicate information I used a Partition function in the inner query. I don't know how you want to order the results but it would look something like this:
(
SELECT ROW_NUMBER() OVER (PARTITION BY column1, column2, etc... ORDER BY columnX DESC/ASC) As row_num, Id, OrganizationId
FROM Server
) t
WHERE t.OrganizationId = Organization.Id AND row_num = 1
That would be essentially the same thing as what you tried to do in your second code block (I believe). The column1 and column2 would be the set of columns of duplicate data that you want to collapse into one entry and columnX would be the column to order the results by. By having row_num = 1 in the WHERE statement, you would only get back the first result for each unique column1, column2, etc.. combo from the inner query.
Upvotes: 1