Reputation: 725
I know there are many regex questions on Stackoverflow and I've studied my code again and again but as a newbie to regex and PHP in general I just don't understand. I have a list of file names such as
1000032842_WMN_2150_cv.pdf
1000041148_BKO_111_SY_bj.pdf
000048316_ED_3100_AMW_2_a.pdf
1000041231_HF_210_WPO_cr.pdf
I am trying to extract the last lowercase characters only: cv
, bj
, a
, cr
I am using the following regex to try and do that: [a-z.]+$
1) Is the regex correct ?
2) What would be the right php function to use to extra the portion of these strings ?
I have used preg_match
, preg_split
, but I am not sure which one I should really use. I THINK preg_split
is the correct function.
$url = "1000036112_GKV_35_VM_32_a.pdf";
$url = preg_split('/[a-z.]+$/', $url);
print_r ($url);
but [1]
is empty.
Array ( [0] => 1000036112_GKV_35_VM_32_ [1] => )
UPDATE EDIT
The following gives a list of int 0, int 1, etc.
<?php
$filename = "urls.csv";
$handle = fopen($filename, "r");
if ($handle !== FALSE) {
while (($data=fgetcsv($handle,99999,',')) !== FALSE) {
$url = $data[1];
var_dump (preg_match_all('/_([a-z]{1,2})\./', $url));
}
}
?>
Upvotes: 2
Views: 122
Reputation: 43169
While you have already accepted an answer, why not come up with sth. as simple as:
_([a-z]+)
For your code, this would come down to:
<?php
$filename = "urls.csv";
$handle = fopen($filename, "r");
$regex = '~_([a-z]+)~';
if ($handle !== FALSE) {
while (($data=fgetcsv($handle,99999,',')) !== FALSE) {
$url = $data[1];
preg_match_all($regex, $url, $matches);
// your matches are in the $matches array
}
}
?>
Upvotes: 1
Reputation: 9654
try this:
[a-z]+(?=\.pdf)
Where (?=\.pdf)
is a "lookahead" regex, basically selects one or more letters [a-z]
if there's .pdf
after them
If you'll have other extensions beside .pdf
then use this regex which will use lookahead and lookbehind to grab strings preceded by _
and followed by a dot .
(?<=_)[a-z]+(?=\.)
Getting the needed strings using PHP:
PHP Fiddle - hit "Run" or F9 to see the result
$urls = array('1000032842_WMN_2150_cv.pdf', '1000041148_BKO_111_SY_bj.pdf', '000048316_ED_3100_AMW_2_a.pdf', '1000041231_HF_210_WPO_cr.pdf');
foreach($urls as $url) {
if (preg_match('/(?<=_)[a-z]+(?=\.)/i', $url, $match)) {
echo $match[0].'<br>';
}
}
Output:
cv
bj
a
cr
Upvotes: 2
Reputation: 18763
_(?<your_group_name>[a-z]{1,2})\.
<?php
$matches = array();
preg_match_all(
'/_([a-z]{1,2})\./',
"1000032842_WMN_2150_cv.pdf
1000041148_BKO_111_SY_bj.pdf
000048316_ED_3100_AMW_2_a.pdf
1000041231_HF_210_WPO_cr.pdf",
$matches
);
var_dump($matches);
?>
array(2) {
[0]=>
array(4) {
[0]=>
string(4) "_cv."
[1]=>
string(4) "_bj."
[2]=>
string(3) "_a."
[3]=>
string(4) "_cr."
}
[1]=>
array(4) {
[0]=>
string(2) "cv"
[1]=>
string(2) "bj"
[2]=>
string(1) "a"
[3]=>
string(2) "cr"
}
}
Upvotes: 1