Mr.Glaurung
Mr.Glaurung

Reputation: 596

Find any part of string in another string transact sql

I have strings in my database that represents days in the week like this:

1234567 (all days of the week)
1230567 (all days but Thursday, EU standard - day 1 is Monday)
0000067 (no day apart from Saturday and Sunday)

And I need to write an SQL question that checks for overlaps. For instance:

1234500 and 0000067 are NOT overlapping.
while 1234500 and 0030000 are overlapping (the 3).
and 1234500 and 0000567 IS overlapping (the 5).

Each entry has an ID, customer number, and this weekday representation.

I was thinking something like this:

SELECT
    *
FROM dbo.Customers c
JOIN dbo.Customers c2 ON c.CustomerNumber = c2.CustomerNumber
    AND c.Days <> c2.Days
WHERE 1 = 1
    AND ...?

To get two entries that are the same customer but when I come to the WHERE statement I hit a blank. Finding a substring (for instance 3) in both Days fields is very easy, but when any one of the 7 entries can be overlapping and I have to exclude 0 (not active day) then I get confused.

I need some help.

Upvotes: 2

Views: 545

Answers (3)

Alan Burstein
Alan Burstein

Reputation: 7918

Without any DDL (underlying table structure) it's impossible to understand where the data lives that you'll be comparing. That said, what you are trying to do will be simple using ngrams8k.

Note this query:

declare @searchstring char(7) = '1234500';
select * from dbo.ngrams8k(@searchstring,1);

returns

position    token 
----------- ------
1           1     
2           2     
3           3     
4           4     
5           5     
6           0     
7           0     

with that in mind, this will help you:

-- sample data
declare @days table (daystring char(7));
insert @days values ('0000067'),('0030000'),('0000567');

declare @searchstring char(7) = '1234500';

-- how to break down and compare the strings
select 
  searchstring    = @searchstring, 
  overlapstring   = OverlapCheck.daystring,
  overlapPosition = OverlapCheck.position,
    overlapValue    = OverlapCheck.token
from dbo.ngrams8k(@searchstring, 1) search
join 
(
  select * 
  from @days d
  cross apply dbo.ngrams8k(d.daystring,1)
  where token <> 0
) OverlapCheck on  search.position = OverlapCheck.position 
                  and search.token = OverlapCheck.token;

Returns:

searchstring overlapstring overlapPosition      overlapValue
------------ ------------- -------------------- ---------------
1234500      0030000       3                    3
1234500      0000567       5                    5

Upvotes: 1

SqlZim
SqlZim

Reputation: 38023

Using two common table expressions, one for a tiny tally table, the other to split Days with cross apply() and substring() for each position using the tiny tally table along with the count() over() windowed aggregation function to count the occurrences of each Day by CustomerNumber. The final select shows each overlapping Day:

;with n as (
  select i from (values (1),(2),(3),(4),(5),(6),(7)) t(i)
)
, expand as (
  select c.CustomerNumber, c.Days, d.Day
    , cnt = count(*) over (partition by c.CustomerNumber, d.Day)
  from Customers c
    cross apply (
      select Day = substring(c.Days,n.i,1)
      from n
      ) d
  where d.Day > 0
)
select * 
from expand
where cnt > 1

rextester demo: http://rextester.com/SZUANG12356

with test setup:

create table Customers (customernumber int, days char(7))
insert into Customers values 
 (1,'1234500')
,(1,'0000067') -- NOT overlapping
,(2,'1234500')
,(2,'0030000') -- IS overlapping (the 3).
,(3,'1234500')
,(3,'0000567') -- IS overlapping (the 5).
;  

returns:

+----------------+---------+-----+-----+
| CustomerNumber |  Days   | Day | cnt |
+----------------+---------+-----+-----+
|              2 | 1234500 |   3 |   2 |
|              2 | 0030000 |   3 |   2 |
|              3 | 1234500 |   5 |   2 |
|              3 | 0000567 |   5 |   2 |
+----------------+---------+-----+-----+


Reference:

Upvotes: 1

AB_87
AB_87

Reputation: 1156

One way of doing it. Matching every days string char by char and ignoring 0s (by replacing with non matching values). Below query will return rows where there was no overlapping days (ignoring 0s) for same customer.

SELECT
    *
FROM Customers c
JOIN Customers c2 ON c.CustomerNumber = c2.CustomerNumber
and c.days <> c2.days
where
(
REPLACE (substring (c.[days],1,1),'0','8') <> REPLACE (substring (c2.[days],1,1) ,'0','9')
AND 
REPLACE (substring (c.[days],2,1),'0','8') <> REPLACE (substring (c2.[days],2,1) ,'0','9')
AND
REPLACE (substring (c.[days],3,1),'0','8') <> REPLACE (substring (c2.[days],3,1) ,'0','9')
AND
REPLACE (substring (c.[days],4,1),'0','8') <> REPLACE (substring (c2.[days],4,1) ,'0','9')
AND 
REPLACE (substring (c.[days],5,1),'0','8') <> REPLACE (substring (c2.[days],5,1) ,'0','9')
AND
REPLACE (substring (c.[days],6,1),'0','8') <> REPLACE (substring (c2.[days],6,1) ,'0','9')
AND
REPLACE (substring (c.[days],7,1),'0','8') <> REPLACE (substring (c2.[days],7,1) ,'0','9')
)

Upvotes: 1

Related Questions