Maxim Seshuk
Maxim Seshuk

Reputation: 213

How can I optimize this regular expression?

I have a regular expression, and I would like to ask whether it is possible to simplify it?

preg_match_all('/([0-9]{2}\.[0-9]{2}\.[0-9]{4}) (([01]?[0-9]|2[0-3])\:[0-5][0-9]\:[0-5][0-9]?) поступление на сумму (\d+) WM([A-Z]) от корреспондента (\d+)/', $message->getMessageBody(), $info);

Upvotes: 0

Views: 114

Answers (2)

kalley
kalley

Reputation: 18462

I think this is the best you can do:

preg_match_all('/((?:\d\d\.){2}\d{4}) (([01]?\d|2[0-3])(:[0-5]\d){1,2}) поступление на сумму (\d+) WM([A-Z]) от корреспондента (\d+)/', $message, $info);

Unless you don't need those exact words in there. Then you could:

preg_match_all('/((?:\d\d\.){2}\d{4}) (([01]?\d|2[0-3])(:[0-5]\d){1,2})\D+(\d+) WM([A-Z])\D+(\d+)/', $message, $info);

Upvotes: 1

Martin Ender
Martin Ender

Reputation: 44279

You can start by using free-spacing mode and some comments (which will help your and everyone else's understanding - which makes simplifying easier). Note that you'll have to put literal spaces in parentheses now, though:

/
(             # group 1
  [0-9]{2}\.[0-9]{2}\.[0-9]{4}
              # match a date
)
[ ]
(             # group 2
  (           # group 3
    [01]?[0-9]# match an hour from 0 to 19
  |           # or
    2[0-3]    # match an hour from 20 to 23
  )
  \:       
  [0-5][0-9]  # minutes
  \:
  [0-5][0-9]? # seconds
)
[ ]поступление[ ]на[ ]сумму[ ]
              # literal text
(\d+)         # a number into group 4
[ ]WM         # literal text
([A-Z])       # a letter into group 5
[ ]от[ ]корреспондента[ ]
              # literal text
(\d+)         # a number into group 6
/x

Now we can't simplify the part at the end - unless you don't want to capture the parenthesised things, in which case you can simply omit most of the parentheses.

You can slightly shorten the expression, by using \d as a substitute for \d, in which case \d\d is even shorter than \d{2}.

Next, there is no need to escape colons.

And finally, there seems to be something odd with your seconds. If you want to allow single-digit seconds, make the 0-5 optional, and not the the \d after it:

/
(             # group 1
  \d\d\.\d\d\.\d{4}
              # match a date
)
[ ]
(             # group 2
  (           # group 3
    [01]?\d   # match an hour from 0 to 19
  |           # or
    2[0-3]    # match an hour from 20 to 23
  )
  :       
  [0-5]\d     # minutes
  :
  [0-5]?\d    # seconds
)
[ ]поступление[ ]на[ ]сумму[ ]
              # literal text
(\d+)         # a number into group 4
[ ]WM         # literal text
([A-Z])       # a letter into group 5
[ ]от[ ]корреспондента[ ]
              # literal text
(\d+)         # a number into group 6
/x

I don't think it will get much simpler than that.

Upvotes: 0

Related Questions