hkutluay
hkutluay

Reputation: 6944

Extract email from field using Oracle Regexp

I want to get e-mail formed texts in a field. I have tried sql below but no luck. See SqlFiddle. Removing ^ and $ from regexp not working too.

WITH TEST_DATA AS (
  SELECT '[email protected]' AS EMAIL FROM DUAL UNION ALL 
  SELECT 'mail [email protected]' FROM DUAL UNION ALL           
  SELECT 'mail [email protected] sent' FROM DUAL UNION ALL                
  SELECT '[email protected] sent count 23' FROM DUAL UNION ALL          
  SELECT 'mail already sent to [email protected] and [email protected]' FROM DUAL UNION ALL                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   
  SELECT '[email protected] sent count 23' FROM DUAL             
)SELECT REGEXP_SUBSTR(EMAIL,'^[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}$') MAIL
 FROM TEST_DATA;

Expected output for this dataset

[email protected] 
[email protected] 
[email protected] 
[email protected] 
[email protected], [email protected] 
[email protected]

Any help appreciated.

Upvotes: 3

Views: 9315

Answers (2)

Noel
Noel

Reputation: 10525

If you want to extract multiple mail ids in a single column, you can use REGEXP_REPLACE function.

Assuming all the ids in your data are valid ones,

REGEXP_REPLACE (EMAIL, '(\w+@\w+\.\w+ ?)|(.)', '\1')

This removes all other text except for mail ids that are separated by at least a space.

You can then remove any trailing spaces and add comma to separate multiple ids.

REPLACE (TRIM (REGEXP_REPLACE (EMAIL, '(\w+@\w+\.\w+ ?)|(.)', '\1')),
            ' ',
            ', ')

Example:

WITH TEST_DATA
     AS (SELECT '[email protected]' AS EMAIL FROM DUAL
         UNION ALL
         SELECT 'mail [email protected]' FROM DUAL
         UNION ALL
         SELECT 'mail [email protected] sent to [email protected] and [email protected]' FROM DUAL
         UNION ALL
         SELECT '[email protected] sent count 23 and [email protected]' FROM DUAL
         UNION ALL
         SELECT 'mail already sent to [email protected] and [email protected]' FROM DUAL
         UNION ALL
         SELECT '[email protected] sent count 23' FROM DUAL)
SELECT REPLACE (TRIM (REGEXP_REPLACE (EMAIL, '(\w+@\w+\.\w+ ?)|(.)', '\1')),
                ' ',
                ', ')
          MAIL
  FROM TEST_DATA;

MAIL
-----------------------------
[email protected]
[email protected]
[email protected], [email protected], [email protected]
[email protected], [email protected]
[email protected], [email protected]
[email protected]

Upvotes: 5

Michael Ford
Michael Ford

Reputation: 861

You are close! try this

SELECT REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') MAIL

edited:

Maybe this helps:

WITH TEST_DATA AS (
  SELECT '[email protected]' AS EMAIL FROM DUAL UNION ALL 
  SELECT 'mail [email protected]' FROM DUAL UNION ALL           
  SELECT 'mail [email protected] sent' FROM DUAL UNION ALL                
  SELECT '[email protected] sent count 23' FROM DUAL UNION ALL          
  SELECT 'mail already sent to [email protected] and [email protected]' FROM DUAL UNION ALL 
  SELECT '[email protected] sent count 23' FROM DUAL             
)SELECT REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') MAIL,
        REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}',1,2) MAIL2
 FROM TEST_DATA

I don't see a way to report 'n' number of matches. I also do not realize how to insert a comma and output into one column. I would bet that if possible, the query will be become quite complex with multiple inner selects/finds/replaces occuring. A better solution may be to return the original result to another language for parsing or to perform such parsing using pl/sql.

Another edit:

Here is what I meant regarding the inner selects. Exact solution to the asked question :-)

select CASE WHEN MAIL2 is not null THEN mail||', '||mail2 ELSE mail END as mail
from (
    WITH TEST_DATA AS (
      SELECT '[email protected]' AS EMAIL FROM DUAL UNION ALL 
      SELECT 'mail [email protected]' FROM DUAL UNION ALL           
      SELECT 'mail [email protected] sent' FROM DUAL UNION ALL                
      SELECT '[email protected] sent count 23' FROM DUAL UNION ALL          
      SELECT 'mail already sent to [email protected] and [email protected]' FROM DUAL UNION ALL 
      SELECT '[email protected] sent count 23' FROM DUAL             
    )SELECT REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}') MAIL,
            REGEXP_SUBSTR(EMAIL,'[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,4}',1,2) MAIL2
     FROM TEST_DATA
)

I also stubled upon this Oracle articale which discusses e-mail matching at point 8. It might be worth a peek. http://www.orafaq.com/node/2404

Upvotes: 3

Related Questions