Win
Win

Reputation: 99

Databricks SQL Query - How to compare columns in multiple tables

I am trying to create a sql query on databricks to retrieve and tag records using some conditions.

I have a Policy table that looks like this:

PolicyId ClientId AgentId
P1234 C1234 A1234
P1235 C1234 A1235
P1236 C1234 A1236
P1237 C1235 A1234
P1238 C1235 A1234
P1239 C1236 A1237
P1230 C1236 A1238
P1241 C1244 A1238
P1242 C1255 A1238

I also have a Agent & Client lookup tables:

AgentId AgentAddress AgentFirstName AgentLastName
P1234 Test Address1 John Anderson
P1235 Test Address2 Mike Smith
P1236 Test Address3 Peter Smith
P1237 Test Address4 Allan Test
P1238 Test Address5 Roger Test2
ClientId ClientAddress ClientFirstName ClientLastName
C1234 Test Address1 Client1 LastName1
C1235 Test Address2 Client2 LastName2
C1236 Test Address3 Client3 Smith
C1244 Test Address4 Client4 Test
C1255 Test Address5 Client5 TEst2

I need to write a query in databricks sql that would determine Policies that has a Client and Agent that share the same address and flag them as orange. Also, to determine Policies that has its Client and Agent share the same address and have the same Surname and mark then as green.

A possible result set could be like this:

PolicyId flag
P1234 orange
P1235 orange
P1236 green
P1237 green

I hope I explained this well, I can give more information if needed.

Thank you for your help!

Upvotes: 0

Views: 1887

Answers (1)

Jonathan Willcock
Jonathan Willcock

Reputation: 5255

I have never used databricks but the SQL below is very standard, so I would expect it to work.

SELECT p.PolicyId, 
        CASE WHEN c.ClientLastName = a.AgentLastName 
        THEN 'green' ELSE 'orange' END AS flag
FROM Policy p
INNER JOIN Client c ON p.ClientId = c.ClientId 
INNER JOIN Agent a ON p.AgentId = a.AgentId 
AND a.AgentAddress = c.ClientAddress

Given your sample data this yields:

PolicyId Flag
P1234 orange
P1242 green

Please note I am assuming that in your test data the AgentIds should begin with an A not a P. Also Policy P1235 does not get included, because although the client is the same as P1234 the agent is not, so the address is also not matching. Same logic applies for P1236 and P1237. P1242 is however included, as ClientId C1255 shares the same address (and LastName hence green not orange) as the AgentId A1238 (assuming case insensitive comparison).

Upvotes: 1

Related Questions