jkierzyk
jkierzyk

Reputation: 139

Using regexp_replace to remove letters from numbers

I've got a bunch of data with altitude - some of it just numbers, some include meters at the end or '. I also have few ranges 1200-1300 etc (I guess it the second problem would have to be solved a different way). I tried experimenting with regexp_replace but [^a-z] doesn't seem to be working. Any of you have a good idea on how to get rid of everything that's not a digit? Also, if you could recommend good website/book/course on how to clear data, I'd be much appreciated. Thanks!

Upvotes: 0

Views: 521

Answers (1)

user5683823
user5683823

Reputation:

Let's leave the ranges (like 1200-1300) to the side, since - even regardless of any kind of programming - it is not clear what you would want to "extract" from that. And, you may also have problems with things like '5 ft 10 in' or similar, if they are possible in your data. (And it is not clear what the whole thing means if all altitudes aren't using the same unit of measurement anyway - some are in meters, some in feet, the info disappears when you just keep the number).

To remove all the non-digits from a string and to keep the digits, you do NOT need regular expressions, which may be quite slow (an order of magnitude slower!) than standard string functions.

One way to remove all non-digit characters uses the TRANSLATE function. Like so:

translate(input_string, '0123456789' || input_string, '0123456789')

The function "translates" (replaces) 0 with 0, 1 with 1, etc., and any character in the input string that hasn't already appeared earlier in the second argument (which in this case means "non-digit") to nothing (null, zip, disappears, is removed).

Example (note the use of TO_NUMBER to also convert to actual numbers):

with
  data (input_string) as (
    select '1500'   from dual union all
    select '2100 m' from dual union all
    select '535 ft' from dual
  )
select input_string,
       to_number(translate(input_string, '0123456789' || input_string, 
                                         '0123456789')) as extracted_number
from   data;

INPUT_STRING EXTRACTED_NUMBER
------------ ----------------
1500                     1500
2100 m                   2100
535 ft                    535

Upvotes: 3

Related Questions