Shawn Northrop
Shawn Northrop

Reputation: 6044

Regex: Rename Files

I am trying to rename a bunch of image files.

They are named inconsistently however there is some logic to it

They all start with an Id number

After the Id there may be some of the following (Items To Be Removed):

These will appear in various orders and sometimes more than once, for the space or dash.

The filenames may have any of these items but not necessarily all of them.

Some filenames do have all 3 items.

They may have an additional _ after this

Then they may have a number {Index}

Finally they end in .ext where ext = jpg|png|gif...

Here are some example filenames:

I am trying to remove/replace the mentioned items so the filenames are as follows:

ID.ext or ID_{index}.ext

So the above list would turn into:

I have tried writing a few expressions but am a little stumped on this one.

I am working on a PHP project though other languages would be fine for this script.

Upvotes: 1

Views: 647

Answers (3)

mickmackusa
mickmackusa

Reputation: 48071

Pattern: /^\d+\K[-a-z_ ]+/i Replace: _ (Pattern Demo)

Basically only match when there are one or more characters between the id and the index. Simple.

/           #pattern delimiter
^           #start of string
\d+         #one or more digits
\K          #restart fullstring match so that the fullstring match is replaced
[-a-z_ ]+   #match one or more hyphens, letters, underscores, or spaces
/           #pattern delimiter
i           #make the pattern case-insensitive

Code: (Demo)

$images=['1227.jpg','1227_1.jpg','2200 WH-1.jpg','2200WH 2.jpg','2200 WH2.jpg','2201_BK 1.png','2203 RD_1.jpg'];
var_export(preg_replace('/^\d+\K[-a-z_ ]+/i','_',$images));

Output:

array (
  0 => '1227.jpg',
  1 => '1227_1.jpg',
  2 => '2200_1.jpg',
  3 => '2200_2.jpg',
  4 => '2200_2.jpg',
  5 => '2201_1.png',
  6 => '2203_1.jpg',
)

Question extension solution: (Demo) (Demo)

You can do it with two patterns and replacements on a single preg_replace() call or you can use preg_replace() then str_replace() to mop up the dangling underscores. This will come down to personal coding preference. (It could also be done with a preg_replace_callback() that checks if there is an index number in the image name before adding the underscore, but that will make a more convoluted snippet.)

Codes:

$images=['1227.jpg','1227_1.jpg','2200 WH-1.jpg','2200WH 2.jpg','2200 WH2.jpg','2201_BK 1.png','2203 RD_1.jpg','2200 WH.jpg','3000_01.jpg'];
foreach($images as $image){
    echo str_replace('_.','.',preg_replace('/^\d+\K[-a-z_ ]+0*/i','_',$image)),"\n";
}

Or

$images=['1227.jpg','1227_1.jpg','2200 WH-1.jpg','2200WH 2.jpg','2200 WH2.jpg','2201_BK 1.png','2203 RD_1.jpg','2200 WH.jpg','3000_01.jpg'];
foreach($images as $image){
    echo preg_replace(['~^\d+\K[-a-z_ ]+0*~i','~_\.~'],['_','.'],$image),"\n";
}

Upvotes: 2

Doxterpepper
Doxterpepper

Reputation: 11

Not a PHP person but the regular expression I would use is:

/(\d+).*?(\d?)\.(.*)/

This will capture the first set of numbers, skip the middle part, capture the number on the end if present, then capture the file extension.

Then in ruby I would do the following:

id, index, extension = my_file_name.match(/(\d+).*?(\d?)\.(.*)/)
new_name = id.to_s
new_name += "_#{index}" unless index.empty?
new_name += ".#{extension}"

Upvotes: 1

GrumpyCrouton
GrumpyCrouton

Reputation: 8620

I would do it with the following pattern:

(\d{4})([^0-9.]*)(\d\.)

And with a substitution of $1_$3.

Step by step:

  • (\d{4}) - Check for the 1st 4 digits.
  • ([^0-9.]*) - Check for everything that is not a number or a period after the ID.
  • (\d\.) - Check for ending number and period before extension (This is so we can properly place the underscore)

Adding the substitution means that the 4 digit number will be added to the beginning, all non-number (or period) characters will be removed, and an underscore will be added between the $1 and whatever is left. If there is nothing after the ID, no underscore will be added, then the period is added inside the substitution as well.

You can view this on Regex101 for a very detailed step-by-step of what is going on.

In PHP this would be:

preg_replace("/(\d{4})([^0-9.]*)(\d)\./", "$1_", $string);

Output:

  • 1227.jpg
  • 1227_1.jpg
  • 2200_1.jpg
  • 2200_2.jpg
  • 2200_2.jpg
  • 2201_1.png
  • 2203_1.jpg

Upvotes: 1

Related Questions