thaibythai
thaibythai

Reputation: 41

finding SSN using Oracle -sqldeveloper

I'm trying to write this query in Oracle -sqldeveloper, and I'm getting error message. this is my FIRST time working in Oracle, so please excuse me if I'm making a newbie mistakes.

I was ask to look at our db to see if end user by any chance enter their SSN number on this table. Some user might enter XXX-XX-1234, some might enter 123-34-567, some might enter XXXXXXXXX, some might enter 123456789 - I need to find ALL of them, I try this in SQL and it work but when i copy the code into oracle-sqldev it came back with error message and I try for a few days now to figure out what the heck is going on but I can't seen to figure it out.

here is there code i have in sql

select*
from ARTICLES 
where ARTICLE_BODY NOT LIKE '[0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9][0-9]'

Upvotes: 0

Views: 1110

Answers (2)

kevinskio
kevinskio

Reputation: 4551

You should read the documentation about Like here. You can use REGEXP_LIKE but this can also be done with other Oracle built-ins

SELECT *
FROM ARTICLES
WHERE LENGTH(TRANSLATE(ARTICLE_BODY,
'1234567890abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ-','1234567890')) > 8

This will give you all the article bodies that have more than 8 numbers in their text. You haven't given a lot of information about what can be entered and validating free text is hard but this should get you started.

However this will not work with CLOBS.

Operations involving patterns as parameters, such as COMPARE, INSTR, and SUBSTR do not support regular expressions or special matching characters (such as % in the LIKE operator in SQL) in the pattern parameter or substrings. From the documentation here

In order to do this will have to create a PL/SQL procedure to read 4000 character chunks, put it into a VARCHAR2 variable and then search in that.

Upvotes: 0

Gary_W
Gary_W

Reputation: 10360

Here's a try with a number of different test strings. Definitely add more as you see fit. Trying to select out of a block of text where the user can enter anything will never be 100%. This attempts to select ARTICLE_BODY where it contains a string of at least 9 but not more than 11 characters consisting of one or more X's or one or more digits or one or one or two optional dashes while treating the string multi-line and case insensitive (the where clause using REGEXP_LIKE). Then it surrounds the matches with triple square brackets to make the matches stand out in the output for you to verify. Note that line 9 is selected as it meets the criteria even though it is not a SSN. That will happen I'm afraid depending on what the user has entered. Make sure the consumer of this data knows these results can't be verified to be 100% accurate. Who knows what could be entered. Just make sure the regular expression handles as many conditions as you know can be identified. Good luck!

SQL> with articles(line, article_body) as (
   select 1, 'aaXXX-XX-1234aa' from dual union
   select 2, '123-45-6789'     from dual union
   select 3, '123456789'       from dual union
   select 4, 'XXXXXXXXX'       from dual union
   select 5, 'XXXXX.XXXX'      from dual union
   select 6, 'efs'             from dual union
   select 7, 'Some user might enter XXX-XX-1234, some might enter 123-34-567, some might enter XXXXXXXXX,
hem, I try this in SQL and it work but when i copy the code into oracle-sqldev it came back with error messa
he heck is going on but I cant seem to figure it out.' from dual union
   select 8, '123'             from dual union
   select 9, 'X12345678X'      from dual
   )
   select line, regexp_replace(article_body, '((X+|\d+|-{1,2}?){9,11})', '[[[\1]]]', 1, 0, 'im') Possible_ssns
   from articles
   where regexp_like(article_body, '(X+|\d+|-{1,2}?){9,11}', 'im')
   order by line;

      LINE POSSIBLE_SSNS
---------- --------------------------------------------------
         1 aa[[[XXX-XX-1234]]]aa
         2 [[[123-45-6789]]]
         3 [[[123456789]]]
         4 [[[XXXXXXXXX]]]
         7 Some user might enter [[[XXX-XX-1234]]], some migh
           t enter [[[123-34-567]]], some might enter [[[XXXX
           XXXXX]]], some might enter [[[123456789]]] - I nee
           d to find ALL of them, I try this in SQL and it wo
           rk but when i copy the code into oracle-sqldev it
           came back with error message and I try for a few d
           ays now to figure out what the heck is going on bu
           t I cant seem to figure it out.
         9 [[[X12345678X]]]

6 rows selected.

SQL>

Upvotes: 0

Related Questions