Reputation: 141
I want to replace some of the diacritics contained in a file by their ASCII equivalent. Please note that I don't want to remove all the diacritics: only those which are before the first "@" character of each line.
In the simplified version of the file below (a.glo), there are four "é" (in bold) to replace by "e". The (probably ugly) regex I use is:
(\\glossaryentry\{(\w|\s|\.)*)(é|è|ê|ë|É|È|Ê|Ë|ē)+
and it works with online tester like www.regex101.com/ and in notepad++!
But nothing is changed when I type in the Windows command line:
perl -pi -i.bak -e "s/(\\glossaryentry\{(\w|\s|\.)*)(é|è|ê|ë|É|È|Ê|Ë|ē)+/$1e/g" a.glo
(fwiw, on my system, perl is v.5.20.2)
a.glo:
\glossaryentry{AHRF@ {\memgloterm{AHRF}}{\memglodesc{Annales historiques de la Révolution française}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{Ass. plén.@ {\memgloterm{Ass. plén.}}{\memglodesc{Assemblée plénière}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{Ch. réun.@ {\memgloterm{Ch. réun.}}{\memglodesc{Chambres réunies}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{chron.@ {\memgloterm{chron.}}{\memglodesc{chronique}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{Circ. min.@ {\memgloterm{Circ. min.}}{\memglodesc{Circulaire ministérielle}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{éd.@ {\memgloterm{éd.}}{\memglodesc{édition, édité par}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{Int J Semiot Law@ {\memgloterm{Int J Semiot Law}}{\memglodesc{International Journal for the Semiotics of Law - Revue internationale de sémiotique juridique}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{Oxford J Legal Studies@ {\memgloterm{Oxford J Legal Studies}}{\memglodesc{Oxford Journal of Legal Studies}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{préc.@ {\memgloterm{préc.}}{\memglodesc{précité}} {\memgloref{}}|memjustarg}{1}
\glossaryentry{Rev. adm.@ {\memgloterm{Rev. adm.}}{\memglodesc{Revue administrative}} {\memgloref{}}|memjustarg}{1}
Upvotes: 4
Views: 288
Reputation:
I tried this on a windows box, it works.
I think though that the file has to open in its correct encoding.
I saved your text sample as ANSI text.
perl -pi -i.bak -e "s/(\\glossaryentry\{[\w\s.]*)[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+/$1e/g" a.glo
# (\\glossaryentry\{[\w\s.]*)[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+
( # (1 start)
\\ glossaryentry \{
[\w\s.]*
) # (1 end)
[\x{E9}\x{E8}\x{EA}\x{EB}\x{C9}\x{C8}\x{CA}\x{CB}\x{113}]+
Upvotes: 2