Reputation: 54
I have an XML file and I need to extract
testname
from all the instances of
<con:testSuite name="testname"
within the XML file.
I am not quite sure how to approach this, or whether this is possible in batch.
Here is what I have thought so far:
1) Use FINDSTR and store every line that has
<con:testSuite name=
in a variable or a temporary file, like this:
FINDSTR /C:"<con:testSuite name=" file.xml > tests.txt
2) Somehow use that file or variable to extract the strings
Note that there might be more than one instance of the matching string in the same line.
I am a novice at batch and any help is appreciated.
Upvotes: 0
Views: 6318
Reputation: 130819
Parsing XML is very painful with batch. Batch is not a good text processor to begin with. However, with some amount of effort you can usually extract the data you want from a given XML file. But the input file could easily be rearranged into an equivalent valid XML form that will break your parser.
With that disclaimer out of the way...
Here is a native batch solution
@echo off
setlocal disableDelayedExpansion
set input="test.xml"
set output="names.txt"
if exist %output% del %output%
for /f "delims=" %%A in ('findstr /n /c:"<con:testSuite name=" %input%') do (
set "ln=%%A"
setlocal enableDelayedExpansion
call :parseLine
endlocal
)
type %output%
exit /b
:parseLine
set "ln2=!ln:*<con:testSuite name=!"
if "!ln2!"=="!ln!" exit /b
for /f tokens^=2^ delims^=^" %%B in ("!ln2!") do (
setlocal disableDelayedExpansion
>>%output% echo(%%B
endlocal
)
set "ln=!ln2!"
goto :parseLine
The FINDSTR /N
option is only there to guarantee that no line begins with a ;
so that we don't have to worry about the pesky default FOR "EOL" option.
The toggling of delayed expansion on and off is to protect any !
characters that may be in the input file. If you know that !
never appears in the input, then you can simply setlocal enableDelayedExpansion
at the top and remove all other setlocal
and endlocal
commands.
The last FOR /F uses special escape sequences to enable the specification of a double quote as a DELIM character.
Answer to additional question in comment
You cannot simply put the additional constraint in the existing FINDSTR command because it will return the entire line that has a match. Remember you said yourself, "there might be more than one instance of the matching string in the same line". The first name might start with the correct prefix, and the 2nd name on the same line might not. You only want to keep the one that starts appropriately.
One solution is to simply change the echo(%%B >>%output%
line as follows:
echo(%%B|findstr "^lp_" >>%output%
The FINDSTR is using a regular expression meta-character ^
to specify that the string must start with lp_
. The quotes have already been removed at this point, so we don't have to worry about them.
However, you may run into a situation in the future where you must include "
in your search string. Plus it might be marginally faster to include the lp_
screen in the initial FINDSTR so that :parseLine
is not called unnecessarily.
FINDSTR requires that search string double quotes are escaped with a back slash. But the Windows CMD processor also has its own rules for escaping. Special characters like >
need to be either quoted or escaped. The original code used quotes, but you want to include a quote in the string, and that creates unbalanced quotes in your command. Windows batch generally likes quotes in pairs. At least one of the quotes must be escaped for CMD as ^"
. If the quote needs to be escaped for both CMD and FINDSTR, then it looks like \^"
.
But any special characters within the string that are no longer functionally quoted from a CMD perspective must be escaped using ^
as well.
Here is one solution that escapes all special characters. It looks awful and is very confusing.
for /f "delims=" %%A in ('findstr /n /c:^"^<con:testSuite^ name^=\^"lp_^" %input%') do (
Here is another solution that looks much better, but it is still confusing to keep track of what is escaped for CMD and what is escaped for FINDSTR.
for /f "delims=" %%A in ('findstr /n /c:"<con:testSuite name=\"lp_^" %input%') do (
One way to keep things a bit simpler is to convert the search into a regular expression. A single double quote can be searched using [\"\"]
. It is a character class expression that matches either a quote or a quote - silly I know. But it keeps quotes paired so that CMD is happy. Now you don't have to worry about escaping any characters for CMD, and you can concentrate on the regex search string.
for /f "delims=" %%A in ('findstr /nr /c:"<con:testSuite name=[\"\"]lp_" %input%') do (
Upvotes: 3