Carg
Carg

Reputation: 141

Replace some diacritics with perl regex

I want to replace some of the diacritics contained in a file by their ASCII equivalent. Please note that I don't want to remove all the diacritics: only those which are before the first "@" character of each line.

In the simplified version of the file below (a.glo), there are four "é" (in bold) to replace by "e". The (probably ugly) regex I use is:

(\\glossaryentry\{(\w|\s|\.)*)(é|è|ê|ë|É|È|Ê|Ë|ē)+

and it works with online tester like www.regex101.com/ and in notepad++!

But nothing is changed when I type in the Windows command line:

perl -pi -i.bak -e "s/(\\glossaryentry\{(\w|\s|\.)*)(é|è|ê|ë|É|È|Ê|Ë|ē)+/$1e/g" a.glo

(fwiw, on my system, perl is v.5.20.2)

a.glo:

\glossaryentry{AHRF@ {\memgloterm{AHRF}}{\memglodesc{Annales historiques de la Révolution française}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Ass. plén.@ {\memgloterm{Ass. plén.}}{\memglodesc{Assemblée plénière}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Ch. réun.@ {\memgloterm{Ch. réun.}}{\memglodesc{Chambres réunies}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{chron.@ {\memgloterm{chron.}}{\memglodesc{chronique}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Circ. min.@ {\memgloterm{Circ. min.}}{\memglodesc{Circulaire ministérielle}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{éd.@ {\memgloterm{éd.}}{\memglodesc{édition, édité par}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Int J Semiot Law@ {\memgloterm{Int J Semiot Law}}{\memglodesc{International Journal for the Semiotics of Law - Revue internationale de sémiotique juridique}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Oxford J Legal Studies@ {\memgloterm{Oxford J Legal Studies}}{\memglodesc{Oxford Journal of Legal Studies}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{préc.@ {\memgloterm{préc.}}{\memglodesc{précité}} {\memgloref{}}|memjustarg}{1}

\glossaryentry{Rev. adm.@ {\memgloterm{Rev. adm.}}{\memglodesc{Revue administrative}} {\memgloref{}}|memjustarg}{1}

Upvotes: 4

Views: 288

Answers (1)

user557597
user557597

Reputation:

I tried this on a windows box, it works.
I think though that the file has to open in its correct encoding.
I saved your text sample as ANSI text.

perl -pi -i.bak -e "s/(\\glossaryentry\{[\w\s.]*)[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+/$1e/g" a.glo

 # (\\glossaryentry\{[\w\s.]*)[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+

 (                             # (1 start)
      \\ glossaryentry \{
      [\w\s.]* 
 )                             # (1 end)
 [\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+ 

Upvotes: 2

Related Questions