Kees Sonnema
Kees Sonnema

Reputation: 5784

Remove everything from a string except a date with a certain pattern

I'm trying to remove everything but a date (dd-mm-yyyy) from a string that I fetch from the database in a foreach.

I could've just removed all text ([A-Z][a-z] etc), but there's numbers in between the text aswell.

$strings = [
    'Originele startdatum ',
    'Op verzoek van klant de ingangsdatum gelijkgetrokken met alle andere abonnementen zodat er maar 1 factuur wordt verstuurd.'
];

$result[] = [
    'AboOpmerking' => str_replace($strings, '', $row['AboOpmerking']),
];

The untouched strings look like this:

Example 1:

Originele startdatum 3-10-2017

Example 2:

Originele startdatum 1-1-2014 Op verzoek van klant de ingangsdatum gelijkgetrokken met alle andere abonnementen zodat er maar 1 factuur wordt verstuurd.

I've found this regex, but I don't know how to use it, because it gives me an empty array when I print $matches

^([0]?[1-9]|[1|2][0-9]|[3][0|1])[./-]([0]?[1-9]|[1][0-2])[./-]([0-9]{4}|[0-9]{2})$

Upvotes: 1

Views: 650

Answers (2)

The fourth bird
The fourth bird

Reputation: 163277

As suggested, there is an alternative where you could match a date like format \d{1,2}-\d{1,2}-\d{4} and create a DateTime and perhaps specify the format to verify it is a valid date.

To replace only the first date, you might use preg_match and preg_replace and specify 1 as the fourth parameter to do only 1 replacement.

$strings = [
    'Originele startdatum 3-10-2017',
    'Originele startdatum 3-10-2017 3-10-2018 ',
    'Originele startdatum 1-1-2014 Op verzoek van klant de ingangsdatum gelijkgetrokken met alle andere abonnementen zodat er maar 1 factuur wordt verstuurd.'
];

$pattern = '/\d{1,2}-\d{1,2}-\d{4}/';
foreach ($strings as $string) {
    if (preg_match($pattern, $string, $matches) === 1 && false !== DateTime::createFromFormat('d-m-Y', $matches[0])) {
        echo preg_replace($pattern, "", $string, 1) . "<br>";
    }
}

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626758

You may replace ^ (that matches the start of string location) and $ (that matches the end of string location) with \b (word boundaries) to match the date substrings as whole words, and use preg_match (to extract only the first match) or preg_match_all (if there are more than one):

preg_match('~\b(?:0?[1-9]|[12][0-9]|3[01])([./-])(?:0?[1-9]|1[0-2])\1(?:[0-9]{4}|[0-9]{2})\b~', $s, $matches);

See the regex demo

An alternative to word boundaries can be (?<!\d) and (?!\d) lookarounds (that are helpful if the dates can be glued to letters or appear in between underscores):

preg_match('~(?<!\d)(?:0?[1-9]|[12][0-9]|3[01])([./-])(?:0?[1-9]|1[0-2])\1(?:[0-9]{4}|[0-9]{2})(?!\d)~', $s, $matches);

Upvotes: 1

Related Questions