MEF
MEF

Reputation: 49

Pattern matching to create a new column in sql?

Looking for some guidance on where to start with a project that likely uses pattern matching in some form. Currently I have a table with a column of property identification numbers that resembles below:

PropertyID|   Property Name|  

A001          Jefferson
A002          Madison
A002-01       Madison Outhouse
A002-02       James Madison Statue
A003          Franklin
A004          Jackson
A004-field2   Jackson Fields
...           ...
A2001         Tubman

The PropertyID field is the primary key for the dataset. The string before the hyphen represents a parent property, while the values after the hyphen indicate sub-properties associated with the parent. What I'm attempting to create is a new column called "Parent Property Name" That would look like below:

PropertyID|   Property Name|          | Parent Property Name|

A001          Jefferson                 Jefferson
A002          Madison                   Madison
A002-01       Madison Outhouse          Madison
A002-02       James Madison Statue      Madison
A003          Franklin                  Franklin
A004          Jackson                   Jackson
A004-field2   Jackson Fields            Jackson
...           ...
A2001         Tubman                    Tubman

Basically what I'm trying to accomplish is if the string is unique before the hyphen/does not have a hyphen return "Property Name" as "Parent Property Name", If there is a hyphen return the "Property Name" value of of the parent (i.e. matching PropertyID before hyphen that does not contain a hyphen itself.

Any thoughts on how I should approach this would be greatly appreciated.

Upvotes: 0

Views: 817

Answers (4)

Kashyap MNVL
Kashyap MNVL

Reputation: 603

This can be achieved by a JOIN as follows: We are extracting the Substring part of the PropertyID before the "-" symbol and joining back to the same table with this. IF a record does not have "-" symbol, induce one so the substring part always returns something and also the CHARINDEX() -1 part does not return a negative value to the SUBSTRING function for length argument.

DECLARE @Temp AS TABLE ( PropertyID NVARCHAR(MAX), PropertyName NVARCHAR(MAX))
INSERT INTO @Temp ( PropertyID ,
                    PropertyName )


SELECT 'A001',          'Jefferson'              UNION ALL
SELECT 'A002',          'Madison'                UNION ALL
SELECT 'A002-01',       'Madison Outhouse'       UNION ALL
SELECT 'A002-02',       'James Madison Statue'   UNION ALL
SELECT 'A003',          'Franklin'               UNION ALL
SELECT 'A004',          'Jackson'                UNION ALL
SELECT 'A004-field2',   'Jackson Fields'         UNION ALL
SELECT 'A2001',         'Tubman'                 


    SELECT *, CHARINDEX('-',T1.PropertyID)
FROM @Temp T1 LEFT JOIN @Temp T2 
ON SUBSTRING(T1.PropertyID,1,CHARINDEX('-',T1.PropertyID+'-')-1) =  T2.PropertyID

Upvotes: 0

JohnLBevan
JohnLBevan

Reputation: 24470

This should do what you're after: SQL Fiddle

with CTE as 
(
  select PropertyId
  , PropertyName
  , PropertyId ParentId
  , PropertyName ParentName
  , PropertyId AncestorId
  , PropertyName AncestorName
  from Property
  where PropertyId not like '%-%'

  union all

  select b.PropertyId
  , b.PropertyName
  , a.PropertyId ParentId
  , a.PropertyName ParentName
  , a.AncestorId
  , a.AncestorName
  from CTE a
  inner join Property b
  on b.PropertyId like a.PropertyId + '-%'
  and b.PropertyId not like a.PropertyId + '-%-%' --avoid having grandchildren too soon

)
select * 
from CTE 
order by PropertyId

The first block of code gets all root elements; i.e. those without hyphens. These values are their own parents & ancestors. These are also the only values that will appear as ancestors of other results, since they're at the top level.

select PropertyId
, PropertyName
, PropertyId ParentId
, PropertyName ParentName
, PropertyId AncestorId
, PropertyName AncestorName
from Property
where PropertyId not like '%-%'

We then use the recursive bit to capture the child elements. This takes those elements from the table whose PropertyId is the same as the table's existing records' PropertyId, only with a hyphen and a new value on the end. We exclude those which have a second hyphen (or more) after this, to ensure we don't list a great grandchild as a child.

In this same block, we assign the values differently; rather than Property, Parent, and Ancestor receiving the same values, Property is the current property, Parent is the PropertyId of the parent record, and Ancestor is the Parent record's Ancestor.

  select b.PropertyId
  , b.PropertyName
  , a.PropertyId ParentId
  , a.PropertyName ParentName
  , a.AncestorId
  , a.AncestorName
  from CTE a
  inner join Property b
  on b.PropertyId like a.PropertyId + '-%'
  and b.PropertyId not like a.PropertyId + '-%-%'

Note

I wondered if my use of LIKE was appropriate, or if alternatives such as CHARINDEX may perform better. In investigating I found this post which shows that CharIndex is best, but per the comments others have found different results, and running this myself I also saw inconsistent behaviours on different machines. So I suspect this is a scenario prone to premature optimisation; i.e. make sure what's there works; then if you need to optimise it test those optimisations out on the environment this code will be run on to ensure the stats are relevant.

That said, here's a version using CHARINDEX, should you wish to test / compare. This is based on @GordonLinoff's solution. SQL Fiddle

with CTE as 
(
  select PropertyId
  , PropertyName
  , PropertyId ParentId
  , PropertyName ParentName
  , PropertyId AncestorId
  , PropertyName AncestorName
  , 0 LastHyphenCharIndex
  from Property
  where charindex('-', PropertyId) = 0

  union all

  select b.PropertyId
  , b.PropertyName
  , a.PropertyId 
  , a.PropertyName 
  , a.AncestorId
  , a.AncestorName
  , charindex('-',b.PropertyId, a.LastHyphenCharIndex)
  from CTE a
  inner join Property b
  --on left(b.PropertyId, charindex('-',b.PropertyId, a.LastHyphenCharIndex)) = a.PropertyId + '-'
  on left(b.PropertyId, nullif(charindex('-',b.PropertyId, a.LastHyphenCharIndex),0)-1) = a.PropertyId 

)
select * 
from CTE 
order by PropertyId

Upvotes: 0

Gordon Linoff
Gordon Linoff

Reputation: 1270513

I think you can do this using a join: SQL Fiddle

select t.*, tparent.PropertyId
from t join
     t tparent
     on left(t.PropertyId, charindex('-', t.PropertyId + '-') - 1) = tparent.PropertyId;

You can also use first_value(): SQL Fiddle

select t.*,
       first_value(propertyId) over (partition by left(t.PropertyId, charindex('-', t.PropertyId + '-') - 1)
                                     order by PropertyId) as parent_propertyid
from t;

Upvotes: 1

Yogesh Sharma
Yogesh Sharma

Reputation: 50173

You can use subquery :

select *,
   (case when charindex ('-', ltrim(rtrim(PropertyID))) > 0
         then (select top 1 t1.PropertyName 
               from table t1
               where t1.PropertyID = left(t.PropertyId, charindex('-', t.PropertyId + '-') - 1)
              ) else PropertyName
     end) as  [Parent Property Name]
from table t;

Upvotes: 0

Related Questions