GeoffDS
GeoffDS

Reputation: 1271

Teradata regular expressions, 0 or 1 spaces

In Teradata, I'm looking for one regular expression pattern that would allow me to find a pattern of some numbers, then a space or maybe no space, and then 'SF'. It should return 7 in both cases below:

SELECT
REGEXP_INSTR('12345 1000SF', pattern),
REGEXP_INSTR('12345 1000 SF', pattern)

Or, my actual goal is to extract the 1000 in both cases if there's an easier way, probably using REGEXP_SUBSTR. More details are below if you need them.

I have a column that contains free text and I would like to extract the square footage. But, in some cases, there is a space between the number and 'SF' and in some cases there is not:

'other stuff 1000 SF'
'other stuff 1000SF'

I am trying to use the REGEXP_INSTR function to find the starting position. Through google, I have found the pattern for the first to be

'([0-9])+ SF'

When I try the pattern for the second, I try

'([0-9])+SF'

and I get the error

SELECT Failed.  [2662] SUBSTR: string subscript out of bounds

I've also found an answer to a similar questions, but they don't work for Teradata. For example, I don't think you can use ? in Teradata.

Upvotes: 2

Views: 2920

Answers (2)

dnoeth
dnoeth

Reputation: 60472

The error message indicates you're using SUBSTR, not REGEXP_SUBSTR.

Try this:

RegExp_Substr(col, '[0-9]*(?= {0,1}SF)')

Find multiple digits followed by a single optional blank followed by SF and extract those digits.

Upvotes: 2

linden2015
linden2015

Reputation: 887

I would pattern it like this:

\b(\d+)\s*[Ss][Ff]\b

\b    # word boundary
(\d+) # 1 or more digits (captured)
\s*   # 0 or more white-space characters
[Ss]  # character class
[Ff]  # character class
\b    # word boundary

Demo

Upvotes: 2

Related Questions