Reputation: 1189
I got stored in array months of the year in italian like this:
$array = [
"gennaio" => "1",
"febbraio" => "2",
"marzo" => "3",
"aprile" => "4",
"maggio" => "5",
"giugno" => "6",
"luglio" => "7",
"agosto" => "8",
"settembre" => "9",
"ottobre" => "10",
"novembre" => "11",
"dicembre" => "12"
];
Also I got this titles also:
$title1 = "Nota_382_del_16 marzo_2016.pdf";
$title2 = "OCDPC 382 del 16_agosto_2016.pdf";
$title3 = "OCDPC_382_del 16 _agosto 2016.pdf";
$title4 = "OCDPC_382_dal 16agosto 2016.pdf";
$title5 = "OCDPC_382 dall 16luglio2016.pdf";
$title6 = "OCDPC_382 da 16agosto_2016.pdf";
$title7 = "OCDPC_382_del_16_settembre 2016.pdf";
$title8 = "OCDPC_382 di 16 _agosto.2016.pdf";
$title9 = "OCDPC_382_del-16-agosto 2016.pdf";
$title10 = "Dipartimento OCDPC 382_del-16-agosto-2016.pdf";
$title11 = "OCDPC_382 dall'16-febbraio-2016.pdf";
$title12 = "OCDPC_382 dal'16-agosto-2016 - Dipartimentocivile.pdf";
In each title I want to get the full date like 16 settembre 2016
and than format it like 16/09/2016
.
I have no problems on formatting the date, but my main issue is using the correct regex
to catch it and than changing the month name to numbers. For changing months on number I can manage it somehow with a switch
statement.
Any type of help will be appreciated!
Edit: Until now I have manage it like this:
(?<![^\W_])?del?\s*\K\d+.?\d+.?20[0-2][0-9]
The actual regex catches the date when the month is referred by number and not the name.
But this is a very specific case and I'm not an regex
expert...
Upvotes: 0
Views: 67
Reputation: 6148
Although this has already been answered by @WiktorStribiżew I'd suggest a slightly different take on the regex...
/(\d\d?)[._ -]*([a-z]+)[._ -]*(\d{4})/i
/ : Pattern delimiter
(\d\d?) : Matches the day (1 or 2 numbers) and assigns to a capture group
[._ -]* : Mathces a delimiter 0 or more times
([a-z]+) : Matches the textual month and assigns to a capture group
[._ -]* : Mathces a delimiter 0 or more times
(\d{4}) : Matches the year (4 numbers) and assigns to a capture group
/ : Pattern delimiter
i : Makes the regex case insensitive, just in case
...which is a bit easier to read and understand. It's also slightly more specific when it comes to the date delimiters (or lack of) so you're less likely to get clashes.
$months = [
"gennaio" => "1",
"febbraio" => "2",
"marzo" => "3",
"aprile" => "4",
"maggio" => "5",
"giugno" => "6",
"luglio" => "7",
"agosto" => "8",
"settembre" => "9",
"ottobre" => "10",
"novembre" => "11",
"dicembre" => "12",
];
$titles = [
"Nota_382_del_16 marzo_2016.pdf",
"OCDPC 382 del 16_agosto_2016.pdf",
"OCDPC_382_del 16 _agosto 2016.pdf",
"OCDPC_382_dal 16agosto 2016.pdf",
"OCDPC_382 dall 16luglio2016.pdf",
"OCDPC_382 da 16agosto_2016.pdf",
"OCDPC_382_del_16_settembre 2016.pdf",
"OCDPC_382 di 16 _agosto.2016.pdf",
"OCDPC_382_del-16-agosto 2016.pdf",
"Dipartimento OCDPC 382_del-16-agosto-2016.pdf",
"OCDPC_382 dall'16-febbraio-2016.pdf",
"OCDPC_382 dal'16-agosto-2016 - Dipartimentocivile.pdf",
];
foreach ($titles as $title) {
preg_match('/(\d\d?)[._ -]*([a-z]+)[._ -]*(\d{4})/i', $title, $dateParts);
echo $dateParts[1], "/", $months[strtolower($dateParts[2])], "/", $dateParts[3], " ", PHP_EOL ;
}
/* Potentially easier to read version:
foreach($titles as $title){
preg_match('/(\d\d?)[._ -]*([a-z]+)[._ -]*(\d{4})/i', $title, $dateParts);
list(, $day, $month, $year) = $dateParts;
$month = $months[strtolower($month)];
echo "$day/$month/$year", PHP_EOL;
}
*/
Output:
16/3/2016
16/8/2016
16/8/2016
16/8/2016
16/7/2016
16/8/2016
16/9/2016
16/8/2016
16/8/2016
16/8/2016
16/2/2016
16/8/2016
Upvotes: 2
Reputation: 627609
You can use
(?<!\d)\d{1,2}[\W_]*\p{L}+[\W_]*\d{4}(?!\d)
(?<!\d)(\d{1,2})[\W_]*(\p{L}+)[\W_]*(\d{4})(?!\d) // With numbered groups
(?<!\d)(?P<day>\d{1,2})[\W_]*(?P<month>\p{L}+)[\W_]*(?P<year>\d{4})(?!\d) // With named groups
See the regex demo. Details:
(?<!\d)
- no digit allowed immediately to the left of the current location\d{1,2}
- one or two digits[\W_]*
- zero or more non-alphanumeric chars\p{L}+
- one or more any Unicode letters[\W_]*
- zero or more non-alphanumeric chars\d{4}
- four digits (\d{2}(?:\d{2})?
can be used if there can be 2-digit year)(?!\d)
- no digit allowed immediately to the right of the current location.In PHP, you can use it like
if (preg_match('~(?<!\d)(?P<day>\d{1,2})[\W_]*(?P<month>\p{L}+)[\W_]*(?P<year>\d{4})(?!\d)~u', $string, $match)) {
echo $match["day"] . PHP_EOL;
echo $match["month"] . PHP_EOL;
echo $match["year"];
}
See the PHP demo.
Upvotes: 2