Jean-Philippe Martin
Jean-Philippe Martin

Reputation: 911

Oracle regex to list unique characters difference between two strings

In Oracle 10g, I'd like to create a regular expression to list the characters that are different between two strings.

Here is the reason: I have a table with a field that contains sometimes Unicode characters that are not in the French language.

I am able to list the rows containing these nonstandards characters to make a future cleanup with this query:

SELECT DataID, Name, CONVERT(NAME, 'WE8ISO8859P1', 'WE8DEC')  
  FROM table
  WHERE NAME <> CONVERT(NAME, 'WE8ISO8859P1', 'WE8DEC' )

where WE8ISO8859P1 - West European (that I accept)

and WE8DEC - 8-bit character sets from Digital Equipment Corporation (that I know that the application support)

I imagine that with an Oracle regular expression I would be able to extract the list of all these nonstandards characters. But I'm not familiar with regexp in Oracle so any help would be appreciated.

Here is my (not working) idea :

select regexp_replace("éaé", '[a-z][A-Z]', '' ) from dual;

would give "é" as a character to cleanup.

Upvotes: 3

Views: 3515

Answers (3)

shaswat kumar
shaswat kumar

Reputation: 1

SELECT LISTAGG( letter, '' ) WITHIN GROUP(ORDER BY letter) 
       FROM ( SELECT DISTINCT substr( 'aaaaabcde', level, 1 ) letter 
                     FROM dual CONNECT BY level <= length('aaaaabcde') )

Upvotes: -1

Plasmer
Plasmer

Reputation: 1130

I think your only mistake was using double quotes for the first parameter in your question. Typically double quotes are only used for column/table names.

The parantheses that DCookie added make the regex match multiple times. Depending on what is in the name field, you may need to add more characters to the replace clause, including puncuation (use two single quotes to match a quote). Here's a start for you:

SELECT regexp_replace(name, '([a-zA-Z ,.;''"])\-?','') problem_characters , count(*) 
FROM table
  WHERE NAME <> CONVERT(NAME, 'WE8ISO8859P1', 'WE8DEC' )
group by regexp_replace(name, '([a-zA-Z ,.;''"])\-?','');

Upvotes: 0

DCookie
DCookie

Reputation: 43533

Perhaps something like this might get you going in the right direction:

SQL> select regexp_replace('éaéABcdEF', '([a-zA-Z])', '' ) problems from dual;

PROBLEMS
--------
éé

It gives you each occurrence of the characters you want to identify, but perhaps that's not a problem or you can refine it...

Upvotes: 3

Related Questions