Reputation: 10117
Not sure if this is the right approach, so chime in if you have something better:
I have a series of data codes that I need to match against. The codes themselves are being scraped and split off of other text but their location and tagging with that text is only about 70% consistent. I figure using regex might be a better approach for getting not just the outliers but all of them, since the codes are pretty standard, but I'm not sure how to target strings with only "certain" letters/parenthesis/asterix/etc. Here are my test examples:
3-301.11(C)*
3-501.16 (Cold)
5-202.11(A)
3-501.16 (Hot)
6-501.111(C)*
7-201.11(A)*
Most of the codes come back fine as:
5-103.11
I am able to use this expression ^[0-9]+[-]+[0-9]+[0-9]+[.]+[0-9]+[0-9]
to target most of these but the endings are throwing me off
I have the samples setup here:
EDIT
Just tried out Frank's solution to add (.*) to my, which worked, but opened a new issue. Since these codes are embedded in text, my test samples should have included additional text after the codes. I have updated the link/test examples.
SOLUTION Thanks to everyone for the help. I updated the link with the (now) working solution.
^\d+-\d+\.\d+(?:\s*\([^()]*\)\*?)?
Upvotes: 1
Views: 176
Reputation: 34628
From the codes you've shown you can simplify this pattern quite a bit:
/\d+-\d+\.\d+/
Explanation:
Note there is no need to put single characters in []
as you did; in fact, [-]
will be misinterpreted because -
inside []
are interpreted as range operators, e.g. [0-9]
.
Upvotes: 1
Reputation: 627537
You may use
'~^\d+-\d+\.\d+(?:\s*\([^()]*\)\*?)?~m'
Or, to match anywhere on a line:
'~\b\d+-\d+\.\d+(?:\s*\([^()]*\)\*?)?~'
See the regex demo
Details
^
- start of a line (replace with \b
word boundary if you need to match anywhere on a line)\d+
- 1+ digits-
- a hyphen\d+\.\d+
- 1+ digits, .
and 1+ digits(?:\s*\([^()]*\)\*?)?
- an optional sequence of patterns matching
\s*
- 0+ whitespaces\(
- a (
[^()]*
- 0+ chars other than (
and )
\)
- a )
\*?
- an optional asterisk symbol.Example of PHP code:
if (preg_match_all('~\b\d+-\d+\.\d+(?:\s*\([^()]*\)\*?)?~', $s, $matches)) {
print_r($matches[0]);
}
Upvotes: 1