Reputation: 13
I want to remove all occurrences of strings
<!--
and
-->
from an XML EXCEPT for the first which surround a comment that I want to keep. I do not want to delete any text enclosed by these strings. The strings all occur on different lines. I am able to delete all instances of a string(s) by using the proposals in Delete certain lines in a txt file via a batch file but am not sure of the best way (using a for loop?) of skipping the first ones.
The XML looks like this:
<?xml version="1.0"?>
<!--
REVISION HISTORY and file descriptions which I want to keep commented
-->
<!--
some code I want to uncomment
-->
<!--
some more code I want to uncomment
-->
Upvotes: 1
Views: 704
Reputation: 34899
The original answer is below; here is much simpler approach, developed for the task at hand:
Here is a pure batch-file solution, based on the findstr
command -- let us call it remove-lines.bat
:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "FILE=%~1" & rem // 1st argument is the original file
set "FILE_NEW=%~2" & rem // 2nd argument is the modified file
set "SKIP_UNTIL=-->" & rem // don't modify lines up to 1st occurrence
set REMOVE="<^!--" "-->" & rem // no `?` and `*` allowed here!
rem // `%` --> `%%` & `!` --> `^!`
if defined FILE (set FILE="%FILE%") else set "FILE="
if not defined FILE_NEW set "FILE_NEW=con"
> "%FILE_NEW%" (
set "FLAG="
for /F "delims=" %%L in ('findstr /N /R "^" %FILE%') do (
set "LINE=%%L"
setlocal EnableDelayedExpansion
set "LINE=!LINE:*:=!"
if defined FLAG (
set "FOUND="
for %%S in (!REMOVE!) do (
echo(| set /P "=_!LINE!" | > nul findstr /L /M /C:"_%%S"
if not ErrorLevel 1 set "FOUND=#"
)
if not defined FOUND echo(!LINE!
) else (
echo(!LINE!
)
echo(| set /P "=_!LINE!" | > nul findstr /L /M /C:"_!SKIP_UNTIL!"
if ErrorLevel 1 (endlocal) else endlocal & set "FLAG=#"
)
)
endlocal
exit /B
Basically, the script reads the text file by the for /F %%L
loop1). In the body of this loop, there is a standard for %%S
loop which iterates through the strings defined by variable REMOVE
. Inside of this loop, variable FOUND
is set as soon as any one of the strings have been found in the current line2). After the loop, the line is returned only if FOUND
is still empty, meaning that none of the strings have been found. All this searching is only accomplished in case variable FLAG
is set, which is done as soon as the string in variable SKIP_UNTIL
is encountered2) the first time. Since this search is done after the check of variable FLAG
, the inner loop does not process the affected line itself. Every read line is returned unedited as long as FLAG
is unset.
1) Such a loop ignores empty lines; to overcome that, the findstr
command temporarily precedes every line with a line number, which is later removed in the body of the loop; this way empty lines are not lost.
2) If you want to force the search string to occur at the beginning or at the end of the current line, add the respective switch /B
or /E
to the findstr
command; to force the entire line to match the search string, add the /X
switch.
To use it for an XML file, say data.xml
in the current directory, and to write the result into file data_new.xml
at the same location, type the following command line:
"remove-lines.bat" "data.xml" "data_new.xml"
This is the original answer, describing a quite complicated approach with two scripts, one calling the other, which has been done that way as the first (sub-)script was already available (although it has been developed for something completely different):
Here is a pure batch-file solution, based on a simple but quite flexible search-and-replace script -- let us call it search+replace.bat
:
@echo off
setlocal DisableDelayedExpansion
rem /* Define pairs of search/replace strings here, separated by spaces,
rem each one in the format `"<search_string>=<replace_string>"`;
rem the `""` are mandatory; `=` separates search from replace string;
rem the replace string may be empty, but the search string must not;
rem if the `=` is omitted, the whole string is taken as search string;
rem both strings must not contain the characters `=`, `*`, `?` and `"`;
rem the search string must not begin with `~`;
rem exclamation marks must be escaped like `^!`;
rem percent signs must be doubled like `%%`;
rem the search is done in a case-insensitive manner;
rem the replacements are done in the given order: */
set STRINGS="<^!--=" "-->="
set "FILE=%~1"
rem // provide a file by command line argument;
rem // if none is given, the console input is taken;
if defined FILE (set FILE="%FILE%") else set "FILE="
set "SKIP=%~2"
rem // provide number of lines to skip optionally;
set /A SKIP+=0
for /F "delims=" %%L in ('findstr /N /R "^" %FILE%') do (
set "LINE=%%L"
for /F "delims=:" %%N in ("%%L") do set "LNUM=%%N"
setlocal EnableDelayedExpansion
set "LINE=!LINE:*:=!"
if !LNUM! GTR %SKIP% (
for %%R in (!STRINGS!) do (
if defined LINE (
for /F "tokens=1,2 delims== eol==" %%S in ("%%~R") do (
set "LINE=!LINE:%%S=%%T!"
)
)
)
)
echo(!LINE!
endlocal
)
endlocal
exit /B
Basically, the script reads the text file by the for /F %%L
loop3). In the body of this loop, there is a standard for %%R
loop which iterates through the search/replace string pairs defined by the variable STRINGS
. Inside of this one, each string pair is split into search and replace strings by another for /F %%S
loop4). The actual string replacement is done using the standard sub-string replacement syntax -- type set /?
for details.
3) Such a loop ignores empty lines; to overcome that, the findstr
command temporarily precedes every line with a line number, which is later removed in the body of the loop; this way empty lines are not lost.
4) This splits the pair at the (first) =
sign, the two parts are then put together again with an =
sign in between; this is usually not necessary but is done though in order to avoid trouble when no =
sign is given.
The STRINGS
variable is adapted to your needs, so to remove the literal strings <!--
and -->
(or, in other words, to replace them by empty strings) -- see the related remark on top of the script.
To use it for an XML file, say data.xml
in the current directory, type the following command line:
"search+replace.bat" "data.xml" 0
The resulting text is written to the console window. To put it into a file, use redirection:
("search+replace.bat" "data.xml" 0)> "data_new.xml"
Regard that you must not specify the same file for both input and output.
The 0
(can be omitted) is an optional argument that specifies how many lines from the beginning should be excluded from being processed. These lines are returned unedited.
Removing strings from a text file may result in several empty lines, like for your sample XML data. To get rid of them, you could use the following command line (entered into command prompt):
(for /F delims^=^ eol^= %F in ('^""search+replace.bat" "data.xml" 0^"') do @echo(%F) > "data_new.xml"
To use this code snippet in a batch file, you need to double the %%
signs.
Since you want to keep the first <!--
/-->
comment (and there are not multiple comments within a single line, according to your sample data), you could use the following script, which determines the number of the first line in data.xml
containing -->
, then calls search+replace.bat
with the file and that line number as arguments, captures the return data of the script, removes any empty lines and writes the result to the new file data_new.xml
:
@echo off
setlocal EnableExtensions DisableDelayedExpansion
rem // Define constants here:
set "FILE=data.xml"
set "FILE_NEW=data_new.xml"
set "SEEK_TEXT=-->"
set "FIRST=#" &rem (set to empty string for last occurrence)
rem // Search for the first (or last) occurrence of `%SEEK%`:
set /A LINE_NUM=0
for /F "delims=:" %%N in ('
findstr /N /L /C:"%SEEK_TEXT%" "%FILE%"
') do (
set "LINE_NUM=%%N"
if defined FIRST goto :CONTINUE
)
:CONTINUE
rem // Call sub-script to search and replace (remove) strings,
rem // remove all empty lines and write result to new file:
(
for /F delims^=^ eol^= %%F in ('
^""%~dp0search+replace.bat" "%FILE%" %LINE_NUM%^"
') do (
echo(%%F
)
) > "%FILE_NEW%"
endlocal
exit /B
Upvotes: 0
Reputation: 24466
The best way of handling any structured markup language (XML, HTML, JSON, etc) is to parse it with the appropriate interpreter. Hacking and scraping as flat text is inviting trouble if the formatting ever changes. Save this with a .bat extension and give it a shot.
@if (@CodeSection == @Batch) @then
@echo off
setlocal
set "infile=test.xml"
set "outfile=test.xml"
cscript /nologo /e:Jscript "%~f0" "%infile%" "%outfile%" && echo Done.
goto :EOF
@end // end batch / begin JScript
var DOM = WSH.CreateObject('Msxml2.DOMDocument.6.0'),
args = { load: WSH.Arguments(0), save: WSH.Arguments(1) };
DOM.load(args.load);
DOM.async = false;
// sanity check the XML
if (DOM.parseError.errorCode) {
var e = DOM.parseError;
WSH.StdErr.WriteLine('Error in ' + args.load + ' line ' + e.line + ' char '
+ e.linepos + ':\n' + e.reason + '\n' + e.srcText);
WSH.Quit(1);
}
var comments = DOM.documentElement.selectNodes('//comment()');
// This will delete all but the first comment.
for (var i=comments.length; --i;) {
comments[i].parentNode.removeChild(comments[i]);
}
DOM.save(args.save);
Edit: I guess if you're working with invalid XML, then manipulating the text as flat text is probably the best solution. Here's a modified version that does this:
@if (@CodeSection == @Batch) @then
@echo off
setlocal
set "infile=test.xml"
set "outfile=test2.xml"
cscript /nologo /e:Jscript "%~f0" "%infile%" "%outfile%" && echo Done.
goto :EOF
@end // end batch / begin JScript
var args = { load: WSH.Arguments(0), save: WSH.Arguments(1) },
fso = WSH.CreateObject('Scripting.FileSystemObject'),
fHand = fso.OpenTextFile(args.load, 1),
matches = 0,
XML = fHand.ReadAll().replace(/<!--|-->/g, function(m) {
return (matches++ > 1) ? '' : m;
});
fHand.Close();
fHand = fso.CreateTextFile(args.save, true);
fHand.Write(XML);
fHand.Close();
Or if you prefer PowerShell, here's a Batch + PowerShell hybrid script that does the same thing using the same logic.
<# : batch portion
@echo off
setlocal
set "infile=test.xml"
set "outfile=test2.xml"
powershell "iex (${%~f0} | out-string)" && echo Done.
goto :EOF
: end Batch / begin PowerShell hybrid code #>
[regex]::replace(
(gc $env:infile | out-string),
"<!--|-->",
{
if ($matches++ -gt 1) {
""
} else {
$args[0].Value
}
}
) | out-file $env:outfile -force
Upvotes: 1