Glinkot
Glinkot

Reputation: 2964

SQL Statement to truncate field from start of provided string?

I have a large table in SQL; one field contains the user's name, which is often followed by various things I need to strip off to obtain their 'plain' name (don't ask!) Eg:

<pre>Mark Johnson
Joe Bloggs (DO NOT USE)
Mick Bronson (refer Jim Bloggs)
Jan Morrison
Jemima Thomson refer Joe harrison
Glen Grabs-Moffat try harry

There are ~20 types of postfix. I'd like to create an UPDATE query (probably 20 I'm guessing) that will 'trim' the value from the start of my provided strings eg " (DO" or " (ref" to get "Joe Bloggs" only with no postfix. Preferably it'd be case insensitive.

Any ideas?

Thanks

EDIT:

The code I was using looked like this:

for (int count = 0; count < ExpenseItems.Count; count++)
            {
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("(DO NOT").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("(DON'T").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("(DONT ").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("(DONOT").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" DO NOT").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" DON'T").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" DONT ").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" DONOT").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("(pls").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("(please").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" pls").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" please").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("(refer").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" refer").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" (Re").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" (ref to").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" ref to").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" (refto").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" refto").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" use ").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" try ").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("(see ").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" see ").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf("director").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" never ").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.TruncateFromStartOf(" moved").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.Replace("DISABLED", "(D)").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.Replace("disabled", "(D)").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.Replace("Disabled", "(D)").Trim();
                ExpenseItems[count].Requester_Name = ExpenseItems[count].Requester_Name.Replace("DISALBED", "(D)").Trim();
            }

The truncate does what it says on the tin:

    public static string TruncateFromStartOf(this string input, string splitString, bool caseSensitive = false, int offset = 0)
    {
        //Verify input
        if (string.IsNullOrEmpty(input))
            return string.Empty;

        if (string.IsNullOrEmpty(splitString))
            return input;

        int segmentIndex = -1;
        //the start of the segment in the input string
        if (caseSensitive)
        {
            segmentIndex = input.IndexOf(splitString, StringComparison.Ordinal);
        }
        else
        {
            segmentIndex = input.ToLower().IndexOf(splitString.ToLower(), StringComparison.Ordinal);
        }

        if (segmentIndex == -1)
            return input; //nothing to remove

        //Return the parts around the segment
        return input.Substring(0, segmentIndex + offset);
    }

Upvotes: 2

Views: 892

Answers (3)

Aaron Bertrand
Aaron Bertrand

Reputation: 280252

Put your exclusion words in a table instead of storing them in code:

CREATE TABLE dbo.TruncationWords
(
       Word VARCHAR(32) NOT NULL UNIQUE
);

INSERT dbo.TruncationWords(Word) 
SELECT '(DO NOT'
UNION ALL SELECT '(DON''T'
UNION ALL SELECT '(DONT'
UNION ALL SELECT '(DONOT'
UNION ALL SELECT 'DO NOT'
UNION ALL SELECT 'DON''T'
UNION ALL SELECT 'DONT'
UNION ALL SELECT 'DONOT'
UNION ALL SELECT '(pls'
UNION ALL SELECT '(please'
UNION ALL SELECT 'pls'
UNION ALL SELECT 'please'
UNION ALL SELECT '(refer'
UNION ALL SELECT 'refer'
UNION ALL SELECT '(Re'
UNION ALL SELECT '(ref to'
UNION ALL SELECT 'ref to'
UNION ALL SELECT '(refto'
UNION ALL SELECT 'refto'
UNION ALL SELECT 'use'
UNION ALL SELECT 'try'
UNION ALL SELECT '(see'
UNION ALL SELECT 'see'
UNION ALL SELECT 'director'
UNION ALL SELECT 'never'
UNION ALL SELECT 'moved'
UNION ALL SELECT 'disabled';

Now you can easily cross-ref these words against any table/column. For example:

DECLARE @t TABLE (Name VARCHAR(255));

INSERT @t SELECT 'Mark Johnson'
UNION ALL SELECT 'Joe Bloggs (DO NOT USE)'
UNION ALL SELECT 'Mick Bronson (refer Jim Bloggs)'
UNION ALL SELECT 'Jan Morrison'
UNION ALL SELECT 'Jemima Thomson refer Joe harrison'
UNION ALL SELECT 'Glen Grabs-Moffat try harry'
UNION ALL SELECT 'Can''t touch this';

;WITH x AS
(
  SELECT 
    t.Name, 
    Trunc = LEFT(t.Name, CHARINDEX(' ' + w.Word, t.Name)),
    rn = ROW_NUMBER() OVER (PARTITION BY t.Name ORDER BY CHARINDEX(' ' + w.Word, t.Name))
   FROM @t AS t
   INNER JOIN dbo.TruncationWords AS w
   ON CHARINDEX(' ' + w.Word, t.Name) > 0
)
UPDATE src
  SET src.Name = x.Trunc
  FROM @t AS src
  INNER JOIN x 
  ON src.Name = x.Name
  WHERE x.rn = 1;

SELECT Name FROM @t;

Results:

Name
--------------------------
Mark Johnson
Joe Bloggs
Mick Bronson
Jan Morrison
Jemima Thomson
Glen Grabs-Moffat
Can't touch this

This solution makes two assumptions:

  1. That the word you want to truncate is always separated by a space.
  2. That the collation is case insensitive. You can use the COLLATE clause to work around this.

Also I think words like 'see' are problematic. What if someone has the name 'John Seek'?

Upvotes: 2

John Woo
John Woo

Reputation: 263693

Try this:

UPDATE tableName
   SET fieldName = RTRIM((CASE
                       WHEN CHARINDEX('(', NameList) = 0
                       THEN NameList
                       ELSE SUBSTRING(NameList, 1, CHARINDEX('(', NameList) - 1)
                     END))

UPDATE 1

SEE SQLFIDDLE for the Sample SELECT statement which will the be used for updating.

Upvotes: 1

Marc
Marc

Reputation: 11613

Let's assume all of your postfixes begin with (. You could do something like this:

Update SOMETABLE
   SET the_name_field = LEFT(the_name_field, CharIndex('(', the_name_field)-1 )
 Where [conditions are met]
;

Upvotes: 0

Related Questions