mrpatg
mrpatg

Reputation: 10117

PHP Regex matching only some letters/words and punctuation

Not sure if this is the right approach, so chime in if you have something better:

I have a series of data codes that I need to match against. The codes themselves are being scraped and split off of other text but their location and tagging with that text is only about 70% consistent. I figure using regex might be a better approach for getting not just the outliers but all of them, since the codes are pretty standard, but I'm not sure how to target strings with only "certain" letters/parenthesis/asterix/etc. Here are my test examples:

3-301.11(C)*
3-501.16 (Cold)
5-202.11(A)
3-501.16 (Hot)
6-501.111(C)*
7-201.11(A)*

Most of the codes come back fine as:

5-103.11

I am able to use this expression ^[0-9]+[-]+[0-9]+[0-9]+[.]+[0-9]+[0-9] to target most of these but the endings are throwing me off

I have the samples setup here:

https://regexr.com/3smmj

EDIT

Just tried out Frank's solution to add (.*) to my, which worked, but opened a new issue. Since these codes are embedded in text, my test samples should have included additional text after the codes. I have updated the link/test examples.

SOLUTION Thanks to everyone for the help. I updated the link with the (now) working solution.

^\d+-\d+\.\d+(?:\s*\([^()]*\)\*?)?

Upvotes: 1

Views: 176

Answers (2)

Mitya
Mitya

Reputation: 34628

From the codes you've shown you can simplify this pattern quite a bit:

/\d+-\d+\.\d+/

Explanation:

  • one or more numbers
  • then a dash
  • one or more numbers
  • then a period
  • one or more numbers

Note there is no need to put single characters in [] as you did; in fact, [-] will be misinterpreted because - inside [] are interpreted as range operators, e.g. [0-9].

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627537

You may use

'~^\d+-\d+\.\d+(?:\s*\([^()]*\)\*?)?~m'

Or, to match anywhere on a line:

'~\b\d+-\d+\.\d+(?:\s*\([^()]*\)\*?)?~'

See the regex demo

Details

  • ^ - start of a line (replace with \b word boundary if you need to match anywhere on a line)
  • \d+ - 1+ digits
  • - - a hyphen
  • \d+\.\d+ - 1+ digits, . and 1+ digits
  • (?:\s*\([^()]*\)\*?)? - an optional sequence of patterns matching
    • \s* - 0+ whitespaces
    • \( - a (
    • [^()]* - 0+ chars other than ( and )
    • \) - a )
    • \*? - an optional asterisk symbol.

Example of PHP code:

if (preg_match_all('~\b\d+-\d+\.\d+(?:\s*\([^()]*\)\*?)?~', $s, $matches)) {
    print_r($matches[0]);
}

Upvotes: 1

Related Questions