KF-SoftwareDev
KF-SoftwareDev

Reputation: 403

Regex Pattern where group may not exist

I have a RegEx pattern that needs to match on any of the following lines:

10-10-15 15:16:41.1 Some Text here 
10-10-15 15:16:41.12 Some Text here 
10-10-15 15:16:41.123 Some Text here 
10-10-15 15:16:41 Some Text here 

I can match the first 3 with the pattern below:

(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})\.(?<milli>\d{0,3})))\s(?<Line>.*)

How do i Match this line (10-10-15 15:16:41 Some Text here) which has no milliseconds but still get the group back in my result either wit a blank value or with 0 as the value?

Thanks

As i said each of the lines below will match:

10-10-15 15:16:41.123 Some text Here
10-10-15 15:16:41.12 Some Text here 
10-10-15 15:16:41.1 Some Text here 
10-10-15 15:16:41. Some Text here 

The groups look like so:

date    [0-18]  `10-10-15 15:16:41.`
day     [0-2]   `10`
month   [3-5]   `10`
year    [6-8]   `15`
time    [9-18]  `15:16:41.`
hour    [9-11]  `15`
minutes [12-14] `16`
seconds [15-17] `41`
milli   [18-18] ``
Line    [19-34] `Some Text here `

Upvotes: 1

Views: 3264

Answers (4)

karthik manchala
karthik manchala

Reputation: 13640

You can use the following (slightly modified version of your regex):

(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})(?<milli>\.\d{0,3})?))\s(?<logEntry>.*)

See DEMO

Explanation:

  • Make the <milli> part optional.. and not the . since it matches strings like 10-10-15 15:16:41123 Some Text here also..

Upvotes: 2

Pedro Lobito
Pedro Lobito

Reputation: 98921

Make the milliseconds optional ?

/^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$/

Example:

<?php

$strings = <<< LOL
10-10-15 15:16:41.1 Some Text here 
10-10-15 15:16:41.12 Some Text here 
10-10-15 15:16:41.123 Some Text here 
10-10-15 15:16:41 Some Text here 
LOL;

preg_match_all('/^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$/m', $strings , $matches, PREG_PATTERN_ORDER);
for ($i = 0; $i < count($matches[0]); $i++) {

    $day = $matches[1][$i];
    $month = $matches[2][$i];
    $year = $matches[3][$i];
    $hours = $matches[4][$i];
    $minutes = $matches[5][$i];
    $seconds = $matches[6][$i];
    $ms = $matches[7][$i];
    $text = $matches[8][$i];


    echo "$day $month $year $hours $minutes $seconds $ms $text \n";
}

Regex Demo:

https://regex101.com/r/aF9wN6/1


PHP Demo:

http://ideone.com/1aEt2E


Regex Explanation:

^([\d]{2})-([\d]{2})-([\d]{2}|[\d]{4})\s+([\d]{2}):([\d]{2}):([\d]{2})\.?(\d+)?\s+(.*?)$

Assert position at the beginning of a line (at beginning of the string or after a line break character) (line feed) «^»
Match the regex below and capture its match into backreference number 1 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “-” literally «-»
Match the regex below and capture its match into backreference number 2 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “-” literally «-»
Match the regex below and capture its match into backreference number 3 «([\d]{2}|[\d]{4})»
   Match this alternative (attempting the next alternative only if this one fails) «[\d]{2}»
      Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
         Exactly 2 times «{2}»
   Or match this alternative (the entire group fails if this one fails to match) «[\d]{4}»
      Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{4}»
         Exactly 4 times «{4}»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, form feed) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 4 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “:” literally «:»
Match the regex below and capture its match into backreference number 5 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “:” literally «:»
Match the regex below and capture its match into backreference number 6 «([\d]{2})»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «[\d]{2}»
      Exactly 2 times «{2}»
Match the character “.” literally «\.?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
Match the regex below and capture its match into backreference number 7 «(\d+)?»
   Between zero and one times, as many times as possible, giving back as needed (greedy) «?»
   Match a single character that is a “digit” (any decimal number in any Unicode script) «\d+»
      Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match a single character that is a “whitespace character” (any Unicode separator, tab, line feed, carriage return, form feed) «\s+»
   Between one and unlimited times, as many times as possible, giving back as needed (greedy) «+»
Match the regex below and capture its match into backreference number 8 «(.*?)»
   Match any single character that is NOT a line break character (line feed) «.*?»
      Between zero and unlimited times, as few times as possible, expanding as needed (lazy) «*?»
Assert position at the end of a line (at the end of the string or before a line break character) (line feed) «$»

Upvotes: 0

dawg
dawg

Reputation: 103844

^(\d+)-(\d+)-(\d+)\s(\d+):(\d+):(\d+)\.?(\d*)([a-zA-Z\s]+)

Note the (\d*) which will return the group even if empty.

Demo

Upvotes: 0

KF-SoftwareDev
KF-SoftwareDev

Reputation: 403

Worked it out. I needed the following pattern:

(?<date>(?<day>\d{1,2})-(?<month>\d{1,2})-(?<year>(?:\d{4}|\d{2}))\s(?<time>(?<hour>\d{2}):(?<minutes>\d{2}):(?<seconds>\d{2})(?<milli>\.?\d{0,3})))\s(?<logEntry>.*)

Upvotes: 0

Related Questions