CpILL
CpILL

Reputation: 7009

Regex for file name parsing

I'm very glad this forum exists as I'm not sure where else to turn with this question. I'm parsing the file name of a whole bunch of files in PHP using preg_match_all() and I want to recover 4 bits of information. The naming convention is:

_tag_99_Nice_name.extension

I need to break this down into 4 parts

  1. tag : I just want the "tag" part. Tags should start and end with underscore's
  2. 99_ : sorting, this can be a 2 digit number followed by an underscore.
  3. .extension : just like a file extension. There may be more than one at the end. i just want the last one
  4. Nice_name : this can be any set of characters allowable file name characters

The tricky part is that the first 3 are optional and may or may not be present so any of the following are valid examples:

_taggy_01_foo_bar.text
69_something.gif
_tag_some_thing.jpg
 basic.example

My best attempt so far is:

/^(?:_+(?P<tag>[a-z0-9]+)*_)?(?:(?P<sort>\d{2})_)?/

but this is just not working and only tries to capture the first 2 parts :(

Any ideas would be of tremendous assistance!

Upvotes: 0

Views: 844

Answers (3)

František Žiačik
František Žiačik

Reputation: 7614

How about this one:

^(_(?P<tag>.*?)_)?((?P<sort>\d\d)_)?(?P<name>[^.]*)?.*([.](?P<ext>[^.]*))$

Upvotes: 1

jrn.ak
jrn.ak

Reputation: 36637

Update: Works in all example cases (and with multiple file extensions).

<?php
    $pattern = "~^(?:_(?P<tag>[A-Za-z0-9]+)_)?(?:(?P<sort>\d{2})?_)?(?P<name>\w+)(?P<ext>[.]\w+)+$~";
    $tests = array(
        "_taggy_01_foo_bar.text",
        "69_something.gif",
        "_tag_some_thing.jpg",
        "basic.example",
        "_loltag_00_pretty_name.extone.exttwo.extthree"
    );

    foreach ($tests as $item) {
        preg_match($pattern, $item, $matches);
        print_r($matches);
    }
?>

Output:

Array
(
    [0] => _taggy_01_foo_bar.text
    [tag] => taggy
    [1] => taggy
    [sort] => 01
    [2] => 01
    [name] => foo_bar
    [3] => foo_bar
    [ext] => .text
    [4] => .text
)
Array
(
    [0] => 69_something.gif
    [tag] => 
    [1] => 
    [sort] => 69
    [2] => 69
    [name] => something
    [3] => something
    [ext] => .gif
    [4] => .gif
)
Array
(
    [0] => _tag_some_thing.jpg
    [tag] => tag
    [1] => tag
    [sort] => 
    [2] => 
    [name] => some_thing
    [3] => some_thing
    [ext] => .jpg
    [4] => .jpg
)
Array
(
    [0] => basic.example
    [tag] => 
    [1] => 
    [sort] => 
    [2] => 
    [name] => basic
    [3] => basic
    [ext] => .example
    [4] => .example
)
Array
(
    [0] => _loltag_00_pretty_name.extone.exttwo.extthree
    [tag] => loltag
    [1] => loltag
    [sort] => 00
    [2] => 00
    [name] => pretty_name
    [3] => pretty_name
    [ext] => .extthree
    [4] => .extthree
)

Upvotes: 2

NikiC
NikiC

Reputation: 101946

'~^(?:_(?<tag>\w+)_)?(?:(?<sort>\d{2})_)?(?<name>[^.]+)\.(?<ext>\w+)$~'

But I'm not really sure, whether I understood, what of it is optional and what is not.

Upvotes: 1

Related Questions