user1709655
user1709655

Reputation: 53

Regex for removing percentage

Hi I would really appreciate some help in forming a regex that removes a percentage from the end of a string:

Film name (2009) 58%  ->  Film name (2009)
Film name (2010) 59%  ->  Film name (2010)

The string may or may not have the bracketed year. Before the bracketed year, the film name may be alphanumeric and have multiple words.

I am using 'bulk rename utility' so am looking to fill in the 'match' and 'replace' fields.

The best I could come up with was:

([A-Z][a-z]*) \((\d*)\) (\d*\%) -->  \1 (\2)

though this only seemed to work with single word film names, and lost the brackets so I had to re-add!

I've google and every time I try possible expressions it doesn't work in the 'bulk rename utility' which I believe is based on pcre (Bulk Rename Utility).

Upvotes: 5

Views: 7499

Answers (5)

Hugo
Hugo

Reputation: 1672

Here is my proposal:

^([1-9]([0-9])*?|0)(\.[0-9]+)?%?$

Matches "12", "0.123", "12.44", "102.12345" and also with the % in the end "11.22%", "11%"....

Matches a percentage with any number of digits before and after the decimal point and with the "%" character in the end (the dot and the % are optional of course).

Hope it helps ;)

Upvotes: 0

Chuck Kollars
Chuck Kollars

Reputation: 2175

You're fortunate that the percentage (if it exists) is always the very last thing. Simply use that as the key fact, and do not try to match anything else. (As a general rule with REs, matching stuff you're not going to change is just adding chances for something to go wrong, without providing any benefit - do it only if you must to make certain the location of the part you're concerned with.)

My guess is some of the previous answers were more or less right, but one didn't work because you had a typo in all those '}' and ')' '|' and '\' (regular expressions have to be exact, back-slash is not forward-slash, square bracket is not curly bracket is not paren, plus is not star, lower-case is not upper-case, you cannot add any white space anywhere, and so forth) and most didn't work because you sometimes have trailing spaces at the ends of your strings. So as your "match" field use
\s+(100|\d\d?)%\s*$
and have your "replace" field be completely empty.

(One other thought: is it possible some of your data has a space between the digits and the percent sign [like this: foo bar (2012) 83 %)? If so, modify the "match" field to allow that eventuality
\s+(100|\d\d?)\s*%\s*$

Upvotes: 0

Gabber
Gabber

Reputation: 5452

To avoid replacing the wrong things do this

\b(100|\d{1,2})%\b

and replace it with nothing.

It stops at word boundaries (ie 30% is ok but w30% is not) and gets only 100 or 0-99 numbers.

EDIT:

If the % is the last char of the string you can achieve a better result in doing

\b(100|\d{1,2})%$

this way you get only the % at the end of the line avoiding to remove numbers with % from the title of the film.

If the string is a filename and you need to replace it and you can't just remove a part of the tile you can do this

(.+?)(100|[0-9]{1,2})%$ #I think using 0-9 is accepted by more languages

and replace with

$1

\1 and \2 should not be used in a replacement expression. They are regex patterns that match what the first and second capture matched. $1 and $2 are variables that contain what the first and second capture matched, so you should use those instead.

Upvotes: 3

Borodin
Borodin

Reputation: 126742

This is very simply done with

s/\s*\d+%$//

which removes a trailing string of digits followed by a percentage sign, together with any preceding space characters

use strict;
use warnings;

while (<DATA>) {
  s/\s*\d+%$//;
  print;
}

__DATA__
Film name (2009) 58%
Film name (2010) 59%

output

Film name (2009)
Film name (2010)

Upvotes: 4

choroba
choroba

Reputation: 241968

I am not familiar with the utility, but in substitution, usually just replacing [0-9]+% with nothing should work. Be careful, though, if there are any films with percentages in their names!

Upvotes: 2

Related Questions