Reputation: 49
I have been struggling with something that should be quite simple for hours now and I would appreciate any advice that could help. I have a Postgres database with addresses, I have a field, building_name which actually contains in many cases, building or apartment numbers. These numbers may or may not be suffixed with a letter e.g. 32A, 24b etc. These combinations could be anywhere in the string including the start or end. They may be followed by whitespace or some other non alphanumeric separator such as a slash or dash. Some examples below:
I am trying to achieve this using a regular expression. The closest I can get is '(\d+\w+)'
which works for some of the above but does not work for:
'2/1' or '6 CAROLINA COURT' or '1/6 NEW ASSEMBLY CLOSE'
I have followed the advice here SQL split string at first occurance of a number but it does not work for my requirements.
Any advice would be hugely appreciated, I am completely stuck!
Many thanks in advance,
Mark
Upvotes: 1
Views: 457
Reputation: 4185
You're forcing a word character, when this is optional (and not catering for non alpha-numeric non-numerics).
So, assuming you're using POSIX regexes in PostGres, try something like this:
(\d+\w*)[ /\\\-]|$
making sure you capture group 1 as your output.
This involved a bit of guesswork, there aren't many PG-flavoured online testers.
Note it seems PostGres doesn't support Perl-flavoured regexes, so your \b
won't ever work here, hence me avoiding it.
Upvotes: 0
Reputation: 36199
Your regexp doesn't quite work because you use the +
qualifier, which searches for one or more letter. If you want to look for one or none, use the ?
qualifier: '\d+\w?'
.
Upvotes: 1
Reputation: 147
As mentioned by Nick B, it would be better to specify the RegEx implementation you are using. As a general answer though, you could try something like this:
(^|\s)(\d+[a-Z]?\b)
and take the second group from the result.
(^|\s)
matches the line start or a whitespace. This allowes to exclude from the output the number 1
in the 2/1
testcase.
Then \d+[a-Z]?
should match any sequence of at least one number followed by at most one letter.
Hope this helps!
Upvotes: 0