Reputation: 21
I have a huge text file whose each line contains a string with pattern FEATURE_. I want to read each line from this txt file and delete all other lines from the file which contain same FEATURE_ string.
Please suggest DOS and perl cmd to do this
for example
Input:
#ifdef FEATURE_ABCD
#ifdef FEATURE_GHDI
#ifdef FEATURE_ABCD
#ifdef FEATURE_WXYZ
#ifdef FEATURE_ABCD
#ifdef FEATURE_WXYZ
#ifdef FEATURE_GHDI
#ifdef FEATUREGHDI
#define FEATURE_ABCD
#define FEATUREGHDI
/* FEATURE_GHDI */
Output:
#ifdef FEATURE_ABCD
#ifdef FEATURE_GHDI
#ifdef FEATURE_WXYZ
#ifdef FEATUREGHDI
Upvotes: 2
Views: 612
Reputation: 37569
Assuming your text file is FEATURE.TXT
, try this:
@ECHO OFF & setlocal enabledelayedexpansion
for /f "delims=" %%i in (FEATURE.TXT) do (
set "line0=%%i"
set "line=!line0:*FEATURE=!"
if not "!line0!"=="!line!" (
for /f %%j in ("!line!") do set "line=%%j"
if not defined $a!line! (
set "$a!line!=!line!"
(echo(!line0!)
)
)
)
You can redirect the output to a file if you put >>OUTPUT.TXT
after the (echo(!line0!)
command.
Output is:
#ifdef FEATURE_ABCD
#ifdef FEATURE_GHDI
#ifdef FEATURE_WXYZ
#ifdef FEATUREGHDI
Edit: some improvements to speed up the code.
Upvotes: 2
Reputation: 67216
There are several different ways to solve this problem, each one with its own characteristics. Fastest solutions execute a minimum number of commands in each line of the input file, avoiding external commands in particular. The Batch file below is designed to process in a fast way a huge text file with many matching lines. The method first create an auxiliary file with the numbers of lines to delete (using FINDSTR command), then do a file merge process with this file and the original one.
@echo off
setlocal EnableDelayedExpansion
set string=FEATURE_
rem Run FINDSTR to find the lines with the target string and store the numbers of the lines that will be deleted
(for /F "tokens=1* delims=:" %%a in ('findstr /N "%string%" inputFile.txt') do (
set "line=%%b"
for /F %%c in ("!line:*%string%=!") do (
rem If this is the first line with the target string
if not defined string[%%c] (
rem Define the target string (and preserve this line)
set string[%%c]=0
) else (
rem Mark this line for deletion
echo %%a
)
)
)) > linesToDelete.txt
rem Insert the EndOfFile mark
echo 0 >> linesToDelete.txt
rem Merge numbers of lines to delete (from STDIN) and input file (from FOR command)
< linesToDelete.txt (
set /P lineToDelete=
for /F "tokens=1* delims=:" %%a in ('findstr /N "^" inputFile.txt') do (
if %%a neq !lineToDelete! (
rem Preserve this line
echo(%%b
) else (
rem Ignore this line and pass to next one to delete
set /P lineToDelete=
)
)
) > outputFile.txt
del linesToDelete.txt
This Batch program fail if the input file contain special Batch characters, like ! < | > &
. This limitation may be fixed, if needed.
Upvotes: 0
Reputation: 20464
Smallest code and funcional:
@echo OFF
Set "File=Input.txt"
Set "OutputFile=Output.txt"
For /F "Usebackq Tokens=2,* delims= " %%# in ("%File%") Do (
Echo "%%#" | Find /I "Feature_" 1>NUL && (
(Type "Features.txt" | FIND /I "%%#" 1>NUL) || (Echo %%#>>"%OutputFile%")))
The code ommits lines without "Feature_" string, if found a valid string then finds inside the output file to see if the string already exists to add or ommit the string.
Tested with your input text, received correct output:
#ifdef FEATURE_ABCD
#ifdef FEATURE_GHDI
#ifdef FEATURE_WXYZ
Upvotes: 0
Reputation: 80033
@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
FOR /f "delims==" %%i IN ('set found 2^>nul') DO SET "%%i="
SET found=FEATURE_
SET /a count=0
(
FOR /f "delims=" %%i IN ('findstr /n "$" ^<feature.txt') DO (
SET feature=%%i
SET line=!feature:*:=!
IF DEFINED line (
SET feature=!line:*FEATURE_=!
IF "!line!"=="!feature!" (ECHO(!line!) ELSE (
FOR /f %%f IN ("!feature!") DO SET feature=%%f&SET found|FINDSTR /e "=%%f" >NUL
IF ERRORLEVEL 1 (
ECHO(!line!
SET found!count!=!feature!
SET /a count+=1
)
)
) ELSE (ECHO()
)
) >newfile.txt
for each line, including empty lines,
foundcounter
BUT
Futher to Aacin's comment, perhaps you should sit down with a nice hot cup of tea and think about what you really want here.
If you do as you've said, then the sequence
#ifdef FEATURE_ABCD
something
endif
or
#ifdef FEATURE_ABCD something
would likely produce something you don't really want - and how about
#ifdef FEATURE_ABCD
...
#define FEATURE_ABCD
...
#ifdef FEATURE_ABCD
??
Upvotes: 1