mlclm
mlclm

Reputation: 725

PHP Regex to extract only portions of different strings

I know there are many regex questions on Stackoverflow and I've studied my code again and again but as a newbie to regex and PHP in general I just don't understand. I have a list of file names such as

1000032842_WMN_2150_cv.pdf

1000041148_BKO_111_SY_bj.pdf

000048316_ED_3100_AMW_2_a.pdf

1000041231_HF_210_WPO_cr.pdf

I am trying to extract the last lowercase characters only: cv, bj, a, cr

I am using the following regex to try and do that: [a-z.]+$

Regex101

1) Is the regex correct ?

2) What would be the right php function to use to extra the portion of these strings ?

I have used preg_match, preg_split, but I am not sure which one I should really use. I THINK preg_split is the correct function.

$url = "1000036112_GKV_35_VM_32_a.pdf";
$url = preg_split('/[a-z.]+$/', $url);
print_r ($url);

but [1] is empty.

Array ( [0] => 1000036112_GKV_35_VM_32_ [1] => )

UPDATE EDIT

The following gives a list of int 0, int 1, etc.

<?php
    $filename = "urls.csv";
    $handle = fopen($filename, "r");
    if ($handle !== FALSE) {
        while (($data=fgetcsv($handle,99999,',')) !== FALSE) {
            $url = $data[1];
            var_dump (preg_match_all('/_([a-z]{1,2})\./', $url));
        }
    }
?>

Upvotes: 2

Views: 122

Answers (3)

Jan
Jan

Reputation: 43169

While you have already accepted an answer, why not come up with sth. as simple as:

_([a-z]+)

For your code, this would come down to:

<?php
$filename = "urls.csv";
$handle = fopen($filename, "r");
$regex = '~_([a-z]+)~';
if ($handle !== FALSE) {
    while (($data=fgetcsv($handle,99999,',')) !== FALSE) {
        $url = $data[1];
        preg_match_all($regex, $url, $matches);
        // your matches are in the $matches array
    }
}
?>

See a demo on regex101.com.

Upvotes: 1

Mi-Creativity
Mi-Creativity

Reputation: 9654

try this:

[a-z]+(?=\.pdf)

Where (?=\.pdf) is a "lookahead" regex, basically selects one or more letters [a-z] if there's .pdf after them

Regex101-1


If you'll have other extensions beside .pdf then use this regex which will use lookahead and lookbehind to grab strings preceded by _ and followed by a dot .

(?<=_)[a-z]+(?=\.)

Regex101-2


Getting the needed strings using PHP:

PHP Fiddle - hit "Run" or F9 to see the result

$urls = array('1000032842_WMN_2150_cv.pdf', '1000041148_BKO_111_SY_bj.pdf', '000048316_ED_3100_AMW_2_a.pdf', '1000041231_HF_210_WPO_cr.pdf');

foreach($urls as $url) {
  if (preg_match('/(?<=_)[a-z]+(?=\.)/i', $url, $match)) {
    echo $match[0].'<br>';
  }
}

Output:

cv
bj
a
cr

Upvotes: 2

abc123
abc123

Reputation: 18763

Regex

_(?<your_group_name>[a-z]{1,2})\.

Regular expression visualization

Debuggex Demo

PHP

<?php
    $matches = array(); 
    preg_match_all(
        '/_([a-z]{1,2})\./', 
        "1000032842_WMN_2150_cv.pdf

1000041148_BKO_111_SY_bj.pdf

000048316_ED_3100_AMW_2_a.pdf

1000041231_HF_210_WPO_cr.pdf", 
        $matches
    ); 
    var_dump($matches);
?>

Result

array(2) {
  [0]=>
  array(4) {
    [0]=>
    string(4) "_cv."
    [1]=>
    string(4) "_bj."
    [2]=>
    string(3) "_a."
    [3]=>
    string(4) "_cr."
  }
  [1]=>
  array(4) {
    [0]=>
    string(2) "cv"
    [1]=>
    string(2) "bj"
    [2]=>
    string(1) "a"
    [3]=>
    string(2) "cr"
  }
}

Upvotes: 1

Related Questions