Michelle H
Michelle H

Reputation: 45

Oracle SQL remove characters and spaces from end of field

I have a text field in my table where SOME of the data ends with one or more characters in parentheses. However, the number of characters is not consistent, the spacing is not consistent, and the number of instances is not consistent. **

Some samples:

Teacher 1 (L)
Teacher 1 (L) (H)
Secretary 5 (MN
Assistant 6(L)
Principal 9 (XYZ)(A)
Custodian 12
Bus Driver 16 (ABC DEF)
Music (TT) 18

No, that's not a typo in Secretary 5, I have multiple jobs where the trailing parenthesis is missing from the name. It's also not a typo in Assistant 6, not all of them have a space between the end of the job name and the parenthesis.

SO FAR, I have not found any jobs that have information in parentheses in the MIDDLE of the name, but I wouldn't put anything past them. The hypothetical for that would be Music (TT) 18, in which case I need to turn that into Music 18.

So, here's my question - I need to remove: all characters that are ENCLOSED in parentheses, all characters that are after an open parenthesis if it's at the end of the line - like in Secretary 5, all parentheses, all multiple spaces changed to one space, and remove all trailing spaces which is currently anywhere from one to three of them after all the removals.

My final desired output based on the above:
Teacher 1
Secretary 5
Assistant 6
Principal 9
Custodian 12
Bus Driver 16
Music 18

I have successfully done a few of the above with reg ex, but right now it's a MESS and is multiple nested expressions that don't catch everything.

**NOTE: I can not change anything about this data. I have no control at all over what I am given, or the total lack of consistency in what is being appended to it. I have to take what I get and then deal with it. So please don't ask me why I'm getting data that looks like this. I am well aware that it makes no sense.

Upvotes: 0

Views: 50

Answers (1)

MT0
MT0

Reputation: 168470

I need to remove: all characters that are ENCLOSED in parentheses, all characters that are after an open parenthesis if it's at the end of the line - like in Secretary 5, all parentheses, all multiple spaces changed to one space, and remove all trailing spaces which is currently anywhere from one to three of them after all the removals.

You can use two passes of REGEXP_REPLACE:

  • First remove characters from an opening round bracket to either a closing round bracket (or the end of the string), inclusive.
  • Second to remove leading or trailing whitespace or to replace multiple whitespace characters with a single space.

Like this:

SELECT value,
       REGEXP_REPLACE(
         REGEXP_REPLACE(
           value,
           '\(.*?(\)|$)'
         ),
         '^\s+|\s+$|(\s){2,}',
         '\1'
       ) AS replaced_value
FROM   table_name

Which, for the sample data:

CREATE TABLE table_name (value) AS
SELECT 'Teacher 1 (L)' FROM DUAL UNION ALL
SELECT 'Teacher 1 (L) (H)' FROM DUAL UNION ALL
SELECT 'Secretary 5 (MN' FROM DUAL UNION ALL
SELECT 'Assistant 6(L)' FROM DUAL UNION ALL
SELECT 'Principal 9 (XYZ)(A)' FROM DUAL UNION ALL
SELECT 'Custodian 12' FROM DUAL UNION ALL
SELECT 'Bus Driver 16 (ABC DEF)' FROM DUAL UNION ALL
SELECT 'Music (TT) 18' FROM DUAL;

Outputs:

VALUE REPLACED_VALUE
Teacher 1 (L) Teacher 1
Teacher 1 (L) (H) Teacher 1
Secretary 5 (MN Secretary 5
Assistant 6(L) Assistant 6
Principal 9 (XYZ)(A) Principal 9
Custodian 12 Custodian 12
Bus Driver 16 (ABC DEF) Bus Driver 16
Music (TT) 18 Music 18

fiddle

Upvotes: 2

Related Questions