B_CooperA
B_CooperA

Reputation: 659

PHP - Regular expression to remove everything else but dates from a string

I'm fetching some data out of my database to be outputted inside .csv file. These outputted rows may or may not include two dates in multiple different formats, including:

YYYY-MM-DD (2019-02-01)
DD.MM.YYYY (01.02.2019)
D.M.YYYY (1.2.2019)
DD.MM.YY (01.02.19)
D.MM.YY (1.02.19)
D.MM.YYYY (1.02.2019)

These two dates are usually located at the end of the string, however there are few occasions where the dates are in the middle of the string. Here's a few example rows outputted from the database:

Product 1, 1.10.2018 - 31.12.2018 just a test string
Product 2 15.12.18-23.6.19

What I'd like to do now is to parse both of these dates in one format (DD.MM.YYYY) if found from the string into own variables.

If dates are not found in the string, the row can be ignored for now. Here's what I've tried so far:

<?php

        function contains_date($str) {
          if (preg_match('/\b(\d{4})-(\d{2})-(\d{2})\b/', $str, $matches)) {
            if (checkdate($matches[2], $matches[3], $matches[1])) {
              return true;
            }
          }
          return false;
        }

        $i = 0;
        $table = [];
        while($row = $stmt->fetch()) {
        if(contains_date($row['product'])) {
          $product = preg_replace('/\s+/', '', $row['product']);
          $date = substr($product, -21);
          $periodStart = date('d.m.Y', strtotime(substr($date, 0, 10)));
          $periodEnd = date('d.m.Y', strtotime(substr($date, 11)));

        }

        $table[$i]['product'] = $row['product'];
        $table[$i]['startDate'] = $periodStart;
        $table[$i]['endDate'] = $periodEnd;

        $i++;
        }
        ?>

This does work on cases where the date is in YYYY-MM-DD format and if the two dates are found at the end of the string. It does not cover all the cases where date format is different nor if the dates are found in the middle of the string. Any help would be much appreciated!

Upvotes: 2

Views: 329

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You may use identically named groups to match different date formats with one regex:

$rx = '/(?J)(?<!\d)(?:(?<year>\d{4})-(?<month>\d{1,2})-(?<day>\d{1,2})|(?<day>\d{1,2})\.(?<month>\d{1,2})\.(?<year>\d{2}(?:\d{2})?))(?!\d)/';

See the regex demo.

POIs:

  • Identically named groups in one pattern are enabled with the J modifier
  • (?<!\d) negative lookbehind at the start of the regex cancels the match after a digit (it matches a location that is not immediately preceded with a digit)
  • (?!\d) negative lookahead at the end of the regex cancels the match before a digit (it matches a location that is not immediately followed with a digit)

PHP demo:

$strs = ['YYYY-MM-DD (2019-02-01)', 'DD.MM.YYYY (01.02.2019)', 'D.M.YYYY (1.2.2019)','DD.MM.YY (01.02.19)','D.MM.YY (1.02.19)','D.MM.YYYY (1.02.2019)','Product 1, 1.10.2018 - 31.12.2018 just a test string','Product 2 15.12.18-23.6.19' ];
$rx = '/(?J)(?<!\d)(?:(?<year>\d{4})-(?<month>\d{1,2})-(?<day>\d{1,2})|(?<day>\d{1,2})\.(?<month>\d{1,2})\.(?<year>\d{2}(?:\d{2})?))(?!\d)/';
foreach ($strs as $s) {
    echo "INPUT: $s\n";
    if (preg_match_all($rx, $s, $matches, PREG_SET_ORDER, 0)) {
        foreach ($matches as $m) {
            echo "DAY: " . $m["day"] . "\nMONTH: " . $m["month"] . "\nYEAR: " . $m["year"] . "\n\n";
        }
    }
}

Output:

INPUT: YYYY-MM-DD (2019-02-01)
DAY: 01
MONTH: 02
YEAR: 2019

INPUT: DD.MM.YYYY (01.02.2019)
DAY: 01
MONTH: 02
YEAR: 2019

INPUT: D.M.YYYY (1.2.2019)
DAY: 1
MONTH: 2
YEAR: 2019

INPUT: DD.MM.YY (01.02.19)
DAY: 01
MONTH: 02
YEAR: 19

INPUT: D.MM.YY (1.02.19)
DAY: 1
MONTH: 02
YEAR: 19

INPUT: D.MM.YYYY (1.02.2019)
DAY: 1
MONTH: 02
YEAR: 2019

INPUT: Product 1, 1.10.2018 - 31.12.2018 just a test string
DAY: 1
MONTH: 10
YEAR: 2018

DAY: 31
MONTH: 12
YEAR: 2018

INPUT: Product 2 15.12.18-23.6.19
DAY: 15
MONTH: 12
YEAR: 18

DAY: 23
MONTH: 6
YEAR: 19

Upvotes: 2

Related Questions