Brian Bruman
Brian Bruman

Reputation: 913

PHP preg_replace for All Types of Year Formats (YYYY, YYYY-YYYY, YYYY - YYYY)

Trying to just use preg_replace to make modifications in a string (adding an html line break) that contains year formats like 2018 1950-2018 and 1950 - 2018

$j = preg_replace('/([0-9]{4}) - ([0-9]{4})/', '<br>* ${1} - ${2}</strong>', $j);
$j = preg_replace('/([0-9]{4})-([0-9]{4})/', '<br>* ${1} - ${2}', $j);
$j = preg_replace('/\s+(19[5-9][0-9]|20(0[0-9]|10))\s+/', '<br>* ${1} </strong>', $j);

My preference would be the regex to be years 1950 - 2020

The first two are working fine (although I was having trouble getting the range correct), but the last one is catching all instances

like:

* 2007
** 2008 - 2013

etc

Tried using ^ and $ to denote the beginning and end.. but the third one always matches the first two.

How can I completely separate these year formats so I can uniquely change each one individually?

Sample Code:

<?php

$string = 'Detailed Applications: 2005-2006 Volkswagen | 2006 Volkswagen Golf 2.0L 1984CC 121Cu. In. l4 GAS SOHC Naturally Aspirated | 2005 Volkswagen Beetle 2.0L 1984CC 121Cu. In. l4 GAS DOHC Naturally Aspirated | 2005 - 2006 Volkswagen Golf';

echo $string;

echo '<br><br>';

$string = preg_replace('/([0-9]{4}) - ([0-9]{4})/', '<br /><strong>(YYYY - YYYY)* ${1} - ${2}</strong>', $string);
$string = preg_replace('/([0-9]{4})-([0-9]{4})/', '<br /><strong>(YYYY-YYYY)* ${1} - ${2}</strong>', $string);
$string = preg_replace('/(\d19[5-9][0-9]|20[0-9][0-9])(?!\s?-)/', '<br /><strong>(YYYY)* ${1} </strong>', $string);

echo $string;

Outputs

Detailed Applications: 
(YYYY-YYYY)* 2005 - 
(YYYY)* 2006 Volkswagen | 
(YYYY)* 2006 Volkswagen Golf 2.0L 1984CC 121Cu. In. l4 GAS SOHC Naturally Aspirated | 
(YYYY)* 2005 Volkswagen Beetle 2.0L 1984CC 121Cu. In. l4 GAS DOHC Naturally Aspirated | 
(YYYY - YYYY)* 2005 - 
(YYYY)* 2006

Sorry, really confused.

Basically what I'm doing is trying to loop it and just do a line-break (without exploding the string)... obviously with my regex it's outputting each on it's own line...

Trying to get an output like this

Detailed Applications: 
(YYYY-YYYY)* 2005 - 2006 Volkswagen | 
(YYYY)* 2006 Volkswagen Golf 2.0L 1984CC 121Cu. In. l4 GAS SOHC Naturally Aspirated | 
(YYYY)* 2005 Volkswagen Beetle 2.0L 1984CC 121Cu. In. l4 GAS DOHC Naturally Aspirated | 
(YYYY - YYYY)* 2005 - 2006

But yeah.. here's the best I've gotten it

$j = preg_replace('/([0-9]{4}) - ([0-9]{4})/', '<br /><strong>* ${1} - ${2}</strong>', $j);
$j = preg_replace('/([0-9]{4})-([0-9]{4})/', '<br /><strong>* ${1} - ${2}</strong>', $j);
$j = preg_replace('/(19[5-9][0-9]|20(0[0-9]|20))(?!\s?-)/', '<br /><strong>* ${1} </strong>', $j);

So here is an actual excerpt from my script

2007 Chevy Silverado Pickup new body style models 2008-2013 Chevy Silverado All Models 2014 Chevy Silverado 2500HD 3500HD 2007 GMC Sierra Pickup new body style models 2008-2013 GMC Sierra All Models 2014 GMC Sierra 2500HD 3500HD 2007-2013 Chevy Tahoe 2007-2013 Chevy Suburban 2007-2013 Chevy Avalanche 2007-2013 GMC Yukon Yukon XL Yukon Denali

Straight up all one line (posted the above sample script because this line does not include a YYYY - YYYY variation...)

It's outputting like this

* 2007 Chevy Silverado Pickup new body style models 
* 2008 - 2013 Chevy Silverado All Models 2014 Chevy Silverado 2500HD 3500HD 
* 2007 GMC Sierra Pickup new body style models 
* 2008 - 2013 GMC Sierra All Models 2014 GMC Sierra 2500HD 3500HD 
* 2007 - 2013 Chevy Tahoe 
* 2007 - 2013 Chevy Suburban 
* 2007 - 2013 Chevy Avalanche 
* 2007 - 2013 GMC Yukon Yukon XL Yukon Denali

All good except * 2008 - 2013 Chevy Silverado All Models 2014 Chevy Silverado 2500HD 3500HD the 2014 isn't breaking into a new line... and I also can't figure out how to do the year range yet (doh!) even referencing this javascript regex validate years in range

Upvotes: 1

Views: 168

Answers (2)

mickmackusa
mickmackusa

Reputation: 47904

preg_replace_callback() will let you validate/extract your targeted substrings and make conditional replacements in one pass.

  • A <br> must be written before all year/year-ranges so long as it is not the very start of the string. The first capture group is \s*, so it will capture zero or more white-space characters before your targeted year/year-range. This element will always exist as [1] in the $m array.
  • The second capture group is the first or only year value. This is required to exist for the callback function to be implemented. This is [2] in the $m array.
  • As for the optional secondary year value, it must follow zero or more white-spaces then a hyphen then zero or more white-spaces. Because there are no capture groups to follow this capture group, php will only generate an element for this group if it is found -- isset() is used to check if [3] exists.
  • Originally, I used \b to ensure that the year values were not substrings of larger digital substrings, but the format of your strings allows the use of white-space matching to confirm an accurate match.

Code: (Demo) (Pattern Demo)

$string = "2007 Chevy Silverado Pickup new body style models 2008-2013 Chevy Silverado All Models 2014 Chevy Silverado 2500HD 3500HD 2007 GMC Sierra Pickup new body style models 2008 - 2013 GMC Sierra All Models 2014 GMC Sierra 2500HD 3500HD 2007-2013 Chevy Tahoe 2007-2013 Chevy Suburban 2007   -   2013 Chevy Avalanche 2007-2013 GMC Yukon Yukon XL Yukon Denali";

echo preg_replace_callback('~(\s*)(19[5-9]\d|20[0-4]\d)(?:\s*-\s*(19[5-9]\d|20[0-4]\d))?(?=\s)~', function($m) {
    //var_export($m);  // un-comment if you want to see each $m array
    //echo "\n---\n";
        return (strlen($m[1]) ? "\n" : "")
                . "<strong>*{$m[2]}"
                . (isset($m[3]) ? " - {$m[3]}" : "")
                . "</strong>";
    },
    $string);

Output:

<strong>*2007</strong> Chevy Silverado Pickup new body style models
<strong>*2008 - 2013</strong> Chevy Silverado All Models
<strong>*2014</strong> Chevy Silverado 2500HD 3500HD
<strong>*2007</strong> GMC Sierra Pickup new body style models
<strong>*2008 - 2013</strong> GMC Sierra All Models
<strong>*2014</strong> GMC Sierra 2500HD 3500HD
<strong>*2007 - 2013</strong> Chevy Tahoe
<strong>*2007 - 2013</strong> Chevy Suburban
<strong>*2007 - 2013</strong> Chevy Avalanche
<strong>*2007 - 2013</strong> GMC Yukon Yukon XL Yukon Denali

Upvotes: 0

Tim Biegeleisen
Tim Biegeleisen

Reputation: 521339

One way to simplify your replacement logic is to recognize that you want to add a <br> after every 4 digit year which is not followed by either a dash, or a space and a dash. We can easily phrase this using a negative lookahead:

(\d{4})(?!\s?-)

Code sample:

$input = "that contains year formats like 2018 1950-2018 and 1950 - 2018";
echo preg_replace("/(\d{4})(?!\s?-)/", "$1<br>", $input);

that contains year formats like 2018<br> 1950-2018<br> and 1950 - 2018<br>

Demo

Upvotes: 1

Related Questions