Kristian Nissen
Kristian Nissen

Reputation: 1184

Doing advanced logic in SQL (T-SQL) - Matching algorithm

Ok, so this particular use case is giving me quite a bit of headache.

This is a edit of another post, where i forgot to add an important aspect to the usecase.

What is need to do is match a Candidates WorkWish with the CustomerJobQueries in the database.

This means match if the DateTimes in the WorkWish is overlapping with any of the CustomerJobQueries. then i need to match the GeographicalPreference with the location of the Customer.

The logic we had when we build the database was this:

It is possible for multiple rows to exists such that 2 specific municipalities is set, the only rule is the there will never exist a row that takes precedence over another.

meaning

GeographicalPreferenceId Country_ShortName    Region_Id   Municipality_Id WorkWish_Id
------------------------ -------------------- ----------- --------------- -----------
1                        DK                   NULL        NULL            1
2                        DK                   3           NULL            1

Is NOT possible because one row already specifies he wishes to work in the entire country.

This is how a snippet of the database looks like

DateRange Table -- contains all DateRanges

DateRangeId StartDate                   EndDate                     CustomerJobQuery_Id WorkWish_Id
----------- --------------------------- --------------------------- ------------------- -----------
1           2014-03-31 00:00:00.0000000 2014-08-18 00:00:00.0000000 NULL                1
2           2014-08-25 00:00:00.0000000 2014-09-15 00:00:00.0000000 NULL                1
3           2013-08-24 00:00:00.0000000 2014-09-25 00:00:00.0000000 1                   NULL

Workwish Table

Id          Candidate_Id
----------- ------------
1           5

CustomerJobQuery Table

Id          Customer_CustomerId Country_ShortName  Region_Id  Municipality_Id
----------- ------------------- -----------------  ---------  -----------------
1           2                   DK                 2          3

GeographicalPreference Table

GeographicalPreferenceId Country_ShortName    Region_Id   Municipality_Id WorkWish_Id
------------------------ -------------------- ----------- --------------- -----------
1                        DK                   2           NULL            1
2                        DK                   3           NULL            1

What is have so far is; - Returns the CustomerJobQueries that overlap with the WorkWish dates.

SELECT c.*
 FROM CustomerJobQuery c
  WHERE EXISTS ( 
   SELECT *
     FROM Workwish w
     INNER JOIN DateRange d1 ON d1.CustomerJobQuery_Id = c.Id
     INNER JOIN DateRange d2 ON d2.WorkWish_Id = w.Id
     WHERE w.Candidate_Id = @CandidateId -- input from my **StoredProcedure**
     -- overlap check:
     AND d1.StartDate <= d2.EndDate
     AND d2.StartDate <= d1.EndDate

)

I am uncertain how i would do the geographicalpreference logic in a setbased manner. - currently our solution is done in C# loading everything into memory and looping over everything with nested foreaches which is horribly ineffective. And is an absolute worst case scenario.

Any help is much appreciated!

Upvotes: 2

Views: 147

Answers (2)

Roger Wolf
Roger Wolf

Reputation: 7692

I have created some tables, since you haven't provided any DDL.

declare @CandidateId int;

declare @WorkWish table (
    Id int primary key,
    CandidateId int
);

declare @Preference table (
    Id int primary key,
    WishId int,
    Country char(2),
    RegionId int null,
    MunicipalityId int null
);

declare @DateRange table (
    Id int primary  key,
    StartDate datetime,
    EndDate datetime,
    JobQueryId int null,
    WishId int null
);

declare @JobQuery table (
    Id int primary key,
    CustomerId int,
    Country char(2),
    RegionId int,
    MunicipalityId int
);


select *,
    case
        when p.MunicipalityId is not null then 0
        when p.RegionId is not null then 1
        else 2
    end as [LocationMetric]
from @WorkWish w
    inner join @Preference p on w.Id = p.WishId
    inner join @DateRange dw on dw.WishId = w.Id
    inner join @JobQuery j on j.Country = p.Country
    inner join @DateRange dj on dj.JobQueryId = j.Id
where w.CandidateId = @CandidateId
    and dw.StartDate <= dj.EndDate
    and dj.StartDate <= dw.EndDate
    and nullif(p.RegionId, j.RegionId) is null
    and nullif(p.MunicipalityId, j.MunicipalityId) is null
order by LocationMetric;

I have to warn you about the performance of the NULLIF() function, however - it's far from perfect. You can try and expand it into a corresponding case construct, but usually it isn't worth the effort.

Still, it should be better than what you have now.

Upvotes: 1

Gordon Linoff
Gordon Linoff

Reputation: 1269953

I think the following will get you all matches:

WITH ww as (
      SELECT ww.*, drw.StartDate, drw.EndDate
      FROM WorkWish ww JOIN
           DateRange drw
           ON drw.WorkWish_Id = w.id
     )
SELECT c.*
FROM CustomerJobQuery c JOIN
     DateRange drc
     ON drc.CustomerJobQuery_Id = c.id JOIN
     ww
     ON drc.StartDate <= ww.EndDate AND ww.StartDate <= drw.EndDate AND
        (drc.Country_ShortName = ww.Country_ShortName and ww.Region_id is NULL or
         drc.Country_ShortName = ww.Country_ShortName and drc.Region_id = ww.Region_id and ww.municipality_id is null or
         drc.Country_ShortName = ww.Country_ShortName and drc.Region_id = ww.Region_id and drc.Municipality_Id = ww.Munipality_Id
        )
)

If you want to prioritize the matches, then use row_number() and a subquery:

WITH ww as (
      SELECT ww.*, drw.StartDate, drw.EndDate
      FROM WorkWish ww JOIN
           DateRange drw
           ON drw.WorkWish_Id = w.id
     )
SELECT c.*
FROM (SELECT c.*, ww.id as WorkId,
             ROW_NUMBER() OVER (PARTITION BY c.id
                                ORDER BY (case when ww.Munipality_Id is not null then 1
                                               when ww.Region_id is not null then 2
                                               else 3
                                          end)
                               ) as seqnum
      FROM CustomerJobQuery c JOIN
           DateRange drc
           ON drc.CustomerJobQuery_Id = c.id JOIN
           ww
           ON drc.StartDate <= ww.EndDate AND ww.StartDate <= drw.EndDate AND
              (drc.Country_ShortName = ww.Country_ShortName and ww.Region_id is NULL or
               drc.Country_ShortName = ww.Country_ShortName and drc.Region_id = ww.Region_id and ww.municipality_id is null or
               drc.Country_ShortName = ww.Country_ShortName and drc.Region_id = ww.Region_id and drc.Municipality_Id = ww.Munipality_Id
              )
     ) c
WHERE seqnum = 1;

Upvotes: 0

Related Questions