Glycoversi
Glycoversi

Reputation: 77

FINDSTR and REGEX returns more numbers than I need

I need FINDSTR to return only a number that I specify, say 2, and not 20, 21, 22, 200, 201, 202 etc.. but my code cycles through any numbers that start with the number I specify. If I specify 13, it will grab 13 AND 130 - 139.

Why does the regex expression FINDSTR /b /r /c:"13[^0-9]" elabs.txt return 13 AND 130 - 139?

My code:

@echo off
setlocal enabledelayedexpansion enableextensions

set "_RunlistName=ELabs.tab"
set /p "Begin=Begin is:"
set /p "End=end is:"

:forloop
REM Find event time and convert to seconds, then output start time and time between two runs into new config file.
For /F "tokens=1,2,3,4 delims=-:." %%a in ('Findstr /B /R /C:"%Begin%[^0-9]" !_RunlistName!') do (
    set /A "hr1=1%%b-100, min1=1%%c-100, sec1=1%%d-100"
    set /A "BS=(hr1 * 3600) + (min1 * 60) + sec1"
    )
For /F "tokens=1,2,3,4 delims=-:." %%a in ('Findstr /B /R /C:"%End%[^0-9]" !_RunlistName!') do (
    set /A "hr2=1%%b-100, min2=1%%c-100, sec2=1%%d-100"
    set /A "ES=(hr2 * 3600) + (min2 * 60) + sec2"
    )

REM For testing...
set /A "delta=ES-BS"
Echo delta = !ES! - !BS! = !delta! 
echo.
echo begin=!begin!, !hr1!:!min1!:!sec1!
Echo end=!end!, !hr2!:!min2!:!sec2!
pause

endlocal
exit /B

Example text from source file:

1   000-00:01:13.221408 6.687540    DATA    RUN_BIT
2   000-00:01:28.533108 6.057900    Zbin    RUN_BIT
3   000-00:01:41.879568 7.632000    Rbin    RUN_BIT
4   000-00:01:55.521768 7.078680    Xbin    RUN_BIT
5   000-00:02:09.841308 7.269480    DATA    RUN_BIT
6   000-01:02:15.702138 39.419280   DATA    RUN_BIT
7   000-01:04:10.840398 70.643700   DATA    RUN_BIT
8   000-01:05:21.741678 16.952262   DATA    RUN_BIT
9   000-01:06:08.897580 228.587940  DATA    RUN_BIT
10  000-01:12:14.890140 17.191080   DATA    RUN_BIT
11  000-01:13:38.126640 59.157540   DATA    RUN_BIT
12  000-01:14:37.551300 47.337480   DATA    RUN_BIT
13  000-01:15:25.155900 66.579024   DATA    RUN_BIT
14  000-01:16:32.002044 58.326924   DATA    RUN_BIT
15  000-01:17:30.596088 42.328980   DATA    RUN_BIT
16  000-01:18:13.182648 13.794840   DATA    RUN_BIT
17  000-01:18:27.235068 16.990740   DATA    RUN_BIT
18  000-01:18:44.483388 27.179460   DATA    RUN_BIT
19  000-01:19:11.929968 34.143660   DATA    RUN_BIT
20  000-01:23:00.689628 206.025840  DATA    RUN_BIT
21  000-01:26:26.973048 28.791084   DATA    RUN_BIT
22  000-01:26:56.021712 58.479564   DATA    RUN_BIT
23  000-01:27:54.758856 13.556340   DATA    RUN_BIT

Upvotes: 2

Views: 472

Answers (3)

Mofi
Mofi

Reputation: 49096

The command

%SystemRoot%\System32\findstr.exe /B /R /C:"13[^0-9]" ELabs.tab

works and outputs just the line beginning with 13 and ignoring the lines beginning with 130 to 139, 1300 to 1399, ...

This command is used within a FOR loop as command to execute of which output to STDOUT should be captured and processed by FOR which results in implicitly running this command line in another command process in background.

But before the command line

Findstr /B /R /C:"%Begin%[^0-9]" !_RunlistName!

is executed with %SystemRoot%\System32\cmd.exe /c in a background command process, it is twice parsed by the command process executing the batch file.

On first parsing done already before executing entire FOR command block, the environment variable reference %Begin% is replaced by the entered number like 13. The caret character ^ is kept on this parsing step. So before running FOR, the command line to later execute becomes for example

Findstr /B /R /C:"13[0-9]" !_RunlistName!

But because of !_RunlistName! referencing environment variable _RunlistName with delayed expansion, this command line is parsed once again immediately before executing the command line in another command process. And on this second parsing step ^ is interpreted as escape character and therefore removed. For that reason the background command process is started with the command line

C:\Windows\System32\cmd.exe /c Findstr /B /R /C:"13[0-9]" ELabs.tab

The lines found by FINDSTR are now not correct anymore because ^ which should mean NOT in the character class definition was before interpreted as escape character and therefore removed which changes the meaning of the regular expression.

One solution is using within FOR command line:

Findstr /B /R /C:"%Begin%[^0-9]" %_RunlistName%

By using %_RunlistName% there is no second parsing step by command process executing the batch file and therefore the caret character is kept on running this command line later for example with

C:\Windows\System32\cmd.exe /c Findstr /B /R /C:"13[0-9]" ELabs.tab

A second solution is escaping the caret with one more caret, i.e. using

Findstr /B /R /C:"%Begin%[^^0-9]" !_RunlistName!

This results also in executing FINDSTR for example with /C:"13[^0-9]".

These two solutions were offered also by MC ND.

Another solution is using within FOR command line:

Findstr /B /R /C:"%Begin%\>" %_RunlistName%

\> means end of a word as explained in help output by running findstr /? from within a command prompt window. As the searched number must be found at beginning of a line, usage of \> results in finding only the two lines starting with exactly the two entered numbers.


How to find out what is going on here on batch file execution?

There is the great, free Windows Sysinternals tool Process Monitor.

After downloading the ZIP file and extracting the files to any local directory like "%ProgramFiles%\Sysinternals" with administrator privileges or "%USERPROFILE%\Desktop\Sysinternals" with current user permissions, the executable file Procmon.exe must be executed with Run as administrator.

On first start of this free tool offered by Microsoft it is necessary to accept the license agreement.

Then the Process Monitor Filter dialog is opened and it is advisable for the investigation of this batch file execution to add two filters:

  1. Process Name is cmd.exe then Include and clicking on button Add.
  2. Process Name is findstr.exe then Include and clicking on button Add.

Next a look on the last 5 symbols on right side of the toolbar should be made which indicate and on click toggle the state of the general display filters. The filing cabinet symbol with tooltip Show File System Activity should be enabled as the only one because getting displayed only file system activities is all needed for investigation of this batch process.

Press Ctrl+X or click on fifth symbol from left side of the toolbar to clear already made record and execute the batch file from within a command prompt window (preferred on debugging a batch file) or with a double click on the batch file.

After batch file execution finished and input focus is back on Process Monitor, press Ctrl+E or click on third symbol from left side of the toolbar to stop capturing which takes some seconds.

Now look on the record. The batch file is executed with process cmd.exe with a specific PID (process identifier).

On scrolling down it can be seen that there is suddenly one more process cmd.exe with a different PID. This is the background command process executed to run the FINDSTR command line.

The command line used to start this second cmd.exe process can be seen after a secondary (right) mouse button click on the line with second cmd.exe to open the context menu, a primary (left) mouse button click on first context menu item Properties and a primary (left) mouse button click on tab Process.

On using Findstr /B /R /C:"%Begin%[^0-9]" !_RunlistName! in batch file and entering 13 for Begin the command line is:

C:\Windows\system32\cmd.exe /c Findstr /B /R /C:"13[0-9]" ELabs.tab

And this makes it clear that the caret character was removed from the command line already before execution of this background command process for executing FINDSTR.

For that reason it is no surprise that the command line of process findstr.exe is in this case:

Findstr /B /R /C:"13[0-9]" ELabs.tab

With modifying the batch file as suggested and running it each time after clearing Process Monitor log with Ctrl+X and enabling capturing again with Ctrl+E, it can be seen how the background command process for execution of the command line specified in FOR command line is really executed resulting in finally executing FINDSTR with the right parameters.

Free Process Monitor is really a great tool to find out the reason for an unexpected behavior of an application or script.

Upvotes: 1

MC ND
MC ND

Reputation: 70923

Following this set of rules your problem is that the delayed expansion phase is consumming the ^ character in the command being executed by the for /f. Double the ^ (one caret will be consummed to escape the other) or remove the delayed variable expansion phase from the command.

Use (keep delayed expansion, double caret)

For /F "tokens=1,2,3,4 delims=-:." %%a in ('
    Findstr /B /R /C:"%Begin%[^^0-9]" !_RunlistName!
') do (

Or (keep caret, remove delayed expansion)

For /F "tokens=1,2,3,4 delims=-:." %%a in ('
    Findstr /B /R /C:"%Begin%[^0-9]" %_RunlistName%
') do (

Upvotes: 2

Compo
Compo

Reputation: 38613

I'm struggling to understand the need for findstr anyhow.

For /F "UseBackQ Tokens=1,3,4,5 Delims=-:. " %%a In ("%_RunlistName%") Do If "%%a"=="%Begin%" (...

And

For /F "UseBackQ Tokens=1,3,4,5 Delims=-:. " %%a In ("%_RunlistName%") Do If "%%a"=="%End%" (...

Upvotes: 1

Related Questions