Reputation: 1
I need a regular expression (ideally PHP compatible) that finds all numbers that are preceded by a boundary, equal sign (=), or colon (:), but ignores percentages (digits followed by a % sign), times, dates, and ISO 8859-1 Symbol Entity Numbers (such as  
).
Have been using the following, but it does not work every time:
/(^:|\b|=|^&)([0-9]*[0-9.]*[0-9]+)(^%:;)?
Upvotes: 0
Views: 1367
Reputation: 2205
Your regexp is seriously broken:
I absolutely recommend reading a good Regular Expression reference -- "man perlre" was my source many years ago, but I'm sure there are better ones now.
The following should do what you want, assuming the numbers start AND END on a boundary, don't have thousands separators and use a dot as decimal separator, that times and dates are sequences of numbers separated by ":", "-", or "/", and that such sequences of numbers are times and dates. It should be easy to improve on this if these assumptions are not correct.
/\b(?<!&#|\d[:\/-])(\d+(?:\.\d+)?)(?!%|[:\/-]\d)\b/
Explanation:
Note I'm also assuming that you don't have numbers preceded by "&#" but not followed by ";". Coding your regexp if this assumption doesn't hold is a more difficult problem.
Test:
$ pcretest
PCRE version 7.8 2008-09-05
re> /\b(?<!&#|\d[:\/-])(\d+(?:\.\d+)?)(?!%|[:\/-]\d)\b/g
data> a12
No match
data> a 12
0: 12
1: 12
data> 12-12
No match
data> 12:12
No match
data> 12 23
0: 12
1: 12
0: 23
1: 23
data> 
No match
data> :12
0: 12
1: 12
data> =12
0: 12
1: 12
data> 12/12
No match
data> 12%
No match
Upvotes: 1