Guido Tarsia
Guido Tarsia

Reputation: 2172

How to get whole phrase containing token in batch

My batch file should output to a text file every whole phrase containing dog.

For example, suppose that input file is this:

dog.house, asdasd dog.dinner
dog.hello

The result should be:

dog.house
dog.dinner
dog.hello

So far, my batch file is:

for /F "tokens=*" %%a in ('findstr /c condor. temp.txt') do echo %%a >> resultado.txt

But returns the whole line instead of the phrase containing "dog.". I'm having trouble understanding how batch works.

Upvotes: 0

Views: 1700

Answers (2)

dbenham
dbenham

Reputation: 130879

You could use a good tutorial on the FOR command. My favorite is http://judago.webs.com/batchforloops.htm

rojo has a good solution as long as the file does not contain * or ? characters. The FOR loop breaks "phrases" at <space>, <tab>, ,, ;, and =. Here is a slightly more efficient version that pipes all "phrases" through a single iteration of FINDSTR. I've also used arcane syntax to disable both DELIMS and EOL in order to preserve lines that begin with ;.

@echo off
setlocal
cmd /c "for /f delims^=^ eol^= %%I in ('findstr /c:"dog." test.txt') do @for %%a in (%%I) do @echo %%a" | findstr /c:"dog." >resultado.txt

The * and ? limitation could be solved by ditching the simple FOR and using search and replace to substitute <line feed> for space, and echo the results through FINDSTR. There is syntax to do that in batch. But it will only make the solution slower :(

I doubt the speed improvement will be enough though. You can either download and use grep as rojo suggests, or you could switch to a better scripting language like VBScript, JScript, or PowerShell.

EDIT

Sometimes you are not in a position to download an executable onto a machine, and you may not be comfortable using another scripting language.

I have written a hybrid batch/JScript utility called REPL.BAT that performs regex search and replace. It works on all modern Windows machines, and does not require any installation process. As long as the REPL.BAT is in your current folder, or else somewhere within your PATH, then the following simple line will do what you want. It uses REPL.BAT to convert all spaces into linefeeds and then pipes the results to FINDSTR. This solution should be plenty fast.

repl.bat " " "\n" x <temp.txt | findstr /c:"dog." >resultado.txt

Here is the REPL.BAT utility. Full documentation is embedded within the script.

@if (@X)==(@Y) @end /* Harmless hybrid line that begins a JScript comment

::************ Documentation ***********
:::
:::REPL  Search  Replace  [Options  [SourceVar]]
:::REPL  /?
:::
:::  Performs a global search and replace operation on each line of input from
:::  stdin and prints the result to stdout.
:::
:::  Each parameter may be optionally enclosed by double quotes. The double
:::  quotes are not considered part of the argument. The quotes are required
:::  if the parameter contains a batch token delimiter like space, tab, comma,
:::  semicolon. The quotes should also be used if the argument contains a
:::  batch special character like &, |, etc. so that the special character
:::  does not need to be escaped with ^.
:::
:::  If called with a single argument of /? then prints help documentation
:::  to stdout.
:::
:::  Search  - By default this is a case sensitive JScript (ECMA) regular
:::            expression expressed as a string.
:::
:::            JScript syntax documentation is available at
:::            http://msdn.microsoft.com/en-us/library/ae5bf541(v=vs.80).aspx
:::
:::  Replace - By default this is the string to be used as a replacement for
:::            each found search expression. Full support is provided for
:::            substituion patterns available to the JScript replace method.
:::            A $ literal can be escaped as $$. An empty replacement string
:::            must be represented as "".
:::
:::            Replace substitution pattern syntax is documented at
:::            http://msdn.microsoft.com/en-US/library/efy6s3e6(v=vs.80).aspx
:::
:::  Options - An optional string of characters used to alter the behavior
:::            of REPL. The option characters are case insensitive, and may
:::            appear in any order.
:::
:::            I - Makes the search case-insensitive.
:::
:::            L - The Search is treated as a string literal instead of a
:::                regular expression. Also, all $ found in Replace are
:::                treated as $ literals.
:::
:::            E - Search and Replace represent the name of environment
:::                variables that contain the respective values. An undefined
:::                variable is treated as an empty string.
:::
:::            M - Multi-line mode. The entire contents of stdin is read and
:::                processed in one pass instead of line by line. ^ anchors
:::                the beginning of a line and $ anchors the end of a line.
:::
:::            X - Enables extended substitution pattern syntax with support
:::                for the following escape sequences:
:::
:::                \\     -  Backslash
:::                \b     -  Backspace
:::                \f     -  Formfeed
:::                \n     -  Newline
:::                \r     -  Carriage Return
:::                \t     -  Horizontal Tab
:::                \v     -  Vertical Tab
:::                \xnn   -  Ascii (Latin 1) character expressed as 2 hex digits
:::                \unnnn -  Unicode character expressed as 4 hex digits
:::
:::                Escape sequences are supported even when the L option is used.
:::
:::            S - The source is read from an environment variable instead of
:::                from stdin. The name of the source environment variable is
:::                specified in the next argument after the option string.
:::

::************ Batch portion ***********
@echo off
if .%2 equ . (
  if "%~1" equ "/?" (
    findstr "^:::" "%~f0" | cscript //E:JScript //nologo "%~f0" "^:::" ""
    exit /b 0
  ) else (
    call :err "Insufficient arguments"
    exit /b 1
  )
)
echo(%~3|findstr /i "[^SMILEX]" >nul && (
  call :err "Invalid option(s)"
  exit /b 1
)
cscript //E:JScript //nologo "%~f0" %*
exit /b 0

:err
>&2 echo ERROR: %~1. Use REPL /? to get help.
exit /b

************* JScript portion **********/
var env=WScript.CreateObject("WScript.Shell").Environment("Process");
var args=WScript.Arguments;
var search=args.Item(0);
var replace=args.Item(1);
var options="g";
if (args.length>2) {
  options+=args.Item(2).toLowerCase();
}
var multi=(options.indexOf("m")>=0);
var srcVar=(options.indexOf("s")>=0);
if (srcVar) {
  options=options.replace(/s/g,"");
}
if (options.indexOf("e")>=0) {
  options=options.replace(/e/g,"");
  search=env(search);
  replace=env(replace);
}
if (options.indexOf("l")>=0) {
  options=options.replace(/l/g,"");
  search=search.replace(/([.^$*+?()[{\\|])/g,"\\$1");
  replace=replace.replace(/\$/g,"$$$$");
}
if (options.indexOf("x")>=0) {
  options=options.replace(/x/g,"");
  replace=replace.replace(/\\\\/g,"\\B");
  replace=replace.replace(/\\b/g,"\b");
  replace=replace.replace(/\\f/g,"\f");
  replace=replace.replace(/\\n/g,"\n");
  replace=replace.replace(/\\r/g,"\r");
  replace=replace.replace(/\\t/g,"\t");
  replace=replace.replace(/\\v/g,"\v");
  replace=replace.replace(/\\x[0-9a-fA-F]{2}|\\u[0-9a-fA-F]{4}/g,
    function($0,$1,$2){
      return String.fromCharCode(parseInt("0x"+$0.substring(2)));
    }
  );
  replace=replace.replace(/\\B/g,"\\");
}
var search=new RegExp(search,options);

if (srcVar) {
  WScript.Stdout.Write(env(args.Item(3)).replace(search,replace));
} else {
  while (!WScript.StdIn.AtEndOfStream) {
    if (multi) {
      WScript.Stdout.Write(WScript.StdIn.ReadAll().replace(search,replace));
    } else {
      WScript.Stdout.WriteLine(WScript.StdIn.ReadLine().replace(search,replace));
    }
  }
}

Upvotes: 2

rojo
rojo

Reputation: 24476

Yeah, for loops are tricky. for without any switches parses word-by-word on one line. for /f loops line-by-line, using tokens as words if needed. What you need is a combination of the two.

@echo off
for /f "delims=" %%I in ('findstr /i "dog\." temp.txt') do (
    for %%a in (%%I) do (
        echo %%a | findstr /i "dog\."
    )
)

That gives you the result you expect.

Example results

C:\Users\me\Desktop>type temp.txt
dog.house, asdasd dog.dinner
dog.hello

C:\Users\me\Desktop>test.bat
dog.house
dog.dinner
dog.hello

Upvotes: 2

Related Questions