Jonas
Jonas

Reputation: 1195

Extract numbers that start with 00 in .txt file

Hi I am trying to find a way to determine a constant in a string and then extract a set amount of characters to the left of that constant.

e.g - I have a .txt file, somewhere in that file there are the numbers 00nnn examples of the numbers would be 00234 00765 ....

So I use

@echo off
findstr /i "00" *.txt > Listfile.txt
end

To find all the strings with the constant 00

Now I have

        00013       Jonas   Jonas
2015-12-09 12:36:41     Bell (waterproof)   
-   Technical Account
        00014       Jonas           Bell    
-   Technical Account
        00019       Jonas   Jonas
2016-09-12 09:11:12             T16032611   Technical Account
        00055   -   Jonas   Jonas
2016-04-29 08:05:14             T16041312   Technical Account
        00057       Jonas   Jonas
2016-04-04 14:36:50             T15123112   Technical Account
        00067       Jonas   Jonas
2016-06-24 09:33:35             T15123112   Technical Account
    00570       Jonas                   T16041312   Technical Account
        00571       Jonas                   T16041312   Technical Account
        00572       Jonas                   T16041312   Technical Account
        00573       Jonas                   T16041312   Technical Account
        00574       Jonas                   T16041312   Technical Account
        00575       Jonas                   T16041312   Technical Account
        00576       Jonas                   T16041312   Technical Account
        00577       Jonas                   T16041312   Technical Account
        00578       Jonas                   T16041312   Technical 

Next I tried :

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
(
FOR /f "delims=" %%a IN (test.txt) DO (
 SET "line=%%a"
 SET "digits=5!line:~-0,5!"
 FOR /L %%z IN (0,1,5) DO SET "digits=!digits:%%z=!"
 IF NOT DEFINED digits ECHO(!line:~0,5!
)
)>newfile.txt

GOTO :EOF

However my problem with this is that there are spaces in the strings and how would I extract the numbers when some start at "digits=5!line:~-0,5!" and others at "digits=13!line:~-8,13!" as an example.

Upvotes: 2

Views: 120

Answers (4)

Aacini
Aacini

Reputation: 67216

@echo off
setlocal EnableDelayedExpansion

for /F "delims=" %%a in (test.txt) do (
   set "line=%%a"
   for /F %%b in ("!line:*00=!") do echo 00%%b
)

The input data should have one 00nnn number per line, so I reformatted your example data this way:

        00013       Jonas   Jonas
2015-12-09 12:36:41     Bell (waterproof)   -   Technical Account        00014       Jonas           Bell    
-   Technical Account        00019       Jonas   Jonas
2016-09-12 09:11:12             T16032611   Technical Account        00055   -   Jonas   Jonas
2016-04-29 08:05:14             T16041312   Technical Account        00057       Jonas   Jonas
2016-04-04 14:36:50             T15123112   Technical Account        00067       Jonas   Jonas
2016-06-24 09:33:35             T15123112   Technical Account    00570       Jonas                   T16041312   Technical Account
        00571       Jonas                   T16041312   Technical Account
        00572       Jonas                   T16041312   Technical Account
        00573       Jonas                   T16041312   Technical Account
        00574       Jonas                   T16041312   Technical Account
        00575       Jonas                   T16041312   Technical Account
        00576       Jonas                   T16041312   Technical Account
        00577       Jonas                   T16041312   Technical Account
        00578       Jonas                   T16041312   Technical 

Output example:

00013
00014
00019
00055
00057
00067
00570
00571
00572
00573
00574
00575
00576
00577
00578

EDIT: New method added using JScript

My first answer is a simple method to solve this problem using just a small Batch file. However, now that other answers had suggested to use regular expressions you should know that you don't need to mess with non-standard utilities (like grep) nor PowerShell in order to use a simple regex in a Batch file. You may use a couple lines of JScript language that comes preinstalled on all Windows versions from XP on:

@if (@CodeSection == @Batch) @then

@echo off
cscript //nologo //E:JScript "%~F0" < test.txt
goto :EOF

@end

var match, search = /00\d{3}/g, file = WScript.StdIn.ReadAll();
while ( match = search.exec(file) ) WScript.Stdout.WriteLine(match[0]);

Copy this code in a Batch file (.bat extension); this code run much faster than the PowerShell solution. You may also get the complete solution to your problem using the next line, that review all *.txt files and extract the numbers in one operation:

findstr /i "00" *.txt | cscript //nologo //E:JScript "%~F0"

Upvotes: 3

Ryan Bemrose
Ryan Bemrose

Reputation: 9266

You can use a regex (from Mark Setchell's answer) by invoking PowerShell and using the Select-String cmdlet to do the same thing as grep.

powershell -c "(sls '00\d{3}' YourFile).matches | select -exp value"

Select-String (sls) uses the regex 00\d{3} to search for all lines containing the characters 00 followed by three digits and matches the whole number. The .matches and select then extract only the part of the line that matches.

Output

00013
00014
00019
00055
00057
00067
00570
00571
00572
00573
00574
00575
00576
00577
00578

PowerShell is installed on every Windows PC; no need to install any third-party programs.

Upvotes: 2

Mark Setchell
Mark Setchell

Reputation: 207465

Install GNU grep for Windows and run:

grep -Eo "00\d{3}" YourFile

to look for "00" followed by exactly 3 digits (\d{3}) and only (-o) print the part of the line that matches.

Output

00013
00014
00019
00055
00057
00067
00570
00571
00572
00573
00574
00575
00576
00577
00578

Upvotes: 1

Stephan
Stephan

Reputation: 56180

extracting all numbers that start with 00 (assuming, there are only spaces or tabs before them):

 for /f %%a in ('type *.txt^|find "00"') do echo %%a

Upvotes: 1

Related Questions