Pankaj
Pankaj

Reputation: 21

Want to read lines from txt file and delete other lines containing same substring

I have a huge text file whose each line contains a string with pattern FEATURE_. I want to read each line from this txt file and delete all other lines from the file which contain same FEATURE_ string.

Please suggest DOS and perl cmd to do this

for example

Input:

#ifdef FEATURE_ABCD
#ifdef FEATURE_GHDI
#ifdef FEATURE_ABCD
#ifdef FEATURE_WXYZ
#ifdef FEATURE_ABCD
#ifdef FEATURE_WXYZ
#ifdef FEATURE_GHDI
#ifdef FEATUREGHDI
#define FEATURE_ABCD
#define FEATUREGHDI
/* FEATURE_GHDI */

Output:

#ifdef FEATURE_ABCD
#ifdef FEATURE_GHDI
#ifdef FEATURE_WXYZ
#ifdef FEATUREGHDI

Upvotes: 2

Views: 612

Answers (4)

Endoro
Endoro

Reputation: 37569

Assuming your text file is FEATURE.TXT, try this:

@ECHO OFF & setlocal enabledelayedexpansion
for /f "delims=" %%i in (FEATURE.TXT) do (
    set "line0=%%i"
    set "line=!line0:*FEATURE=!"
    if not "!line0!"=="!line!" (
        for /f %%j in ("!line!") do set "line=%%j"
        if not defined $a!line! (
            set "$a!line!=!line!"
            (echo(!line0!)
        )
    )
)   

You can redirect the output to a file if you put >>OUTPUT.TXT after the (echo(!line0!) command.

Output is:

#ifdef FEATURE_ABCD
#ifdef FEATURE_GHDI
#ifdef FEATURE_WXYZ
#ifdef FEATUREGHDI

Edit: some improvements to speed up the code.

Upvotes: 2

Aacini
Aacini

Reputation: 67216

There are several different ways to solve this problem, each one with its own characteristics. Fastest solutions execute a minimum number of commands in each line of the input file, avoiding external commands in particular. The Batch file below is designed to process in a fast way a huge text file with many matching lines. The method first create an auxiliary file with the numbers of lines to delete (using FINDSTR command), then do a file merge process with this file and the original one.

@echo off
setlocal EnableDelayedExpansion

set string=FEATURE_

rem Run FINDSTR to find the lines with the target string and store the numbers of the lines that will be deleted
(for /F "tokens=1* delims=:" %%a in ('findstr /N "%string%" inputFile.txt') do (
   set "line=%%b"
   for /F %%c in ("!line:*%string%=!") do (
      rem If this is the first line with the target string
      if not defined string[%%c] (
         rem Define the target string (and preserve this line)
         set string[%%c]=0
      ) else (
         rem Mark this line for deletion
         echo %%a
      )
   )
)) > linesToDelete.txt
rem Insert the EndOfFile mark
echo 0 >> linesToDelete.txt

rem Merge numbers of lines to delete (from STDIN) and input file (from FOR command)
< linesToDelete.txt (
   set /P lineToDelete=
   for /F "tokens=1* delims=:" %%a in ('findstr /N "^" inputFile.txt') do (
      if %%a neq !lineToDelete! (
         rem Preserve this line
         echo(%%b
      ) else (
         rem Ignore this line and pass to next one to delete
         set /P lineToDelete=
      )
   )
) > outputFile.txt

del linesToDelete.txt

This Batch program fail if the input file contain special Batch characters, like ! < | > &. This limitation may be fixed, if needed.

Upvotes: 0

ElektroStudios
ElektroStudios

Reputation: 20464

Smallest code and funcional:

@echo OFF

Set "File=Input.txt"
Set "OutputFile=Output.txt"

For /F "Usebackq Tokens=2,* delims= " %%# in ("%File%") Do (
    Echo "%%#" | Find /I "Feature_" 1>NUL && (
        (Type "Features.txt" | FIND /I "%%#" 1>NUL) || (Echo %%#>>"%OutputFile%")))

The code ommits lines without "Feature_" string, if found a valid string then finds inside the output file to see if the string already exists to add or ommit the string.

Tested with your input text, received correct output:

#ifdef FEATURE_ABCD
#ifdef FEATURE_GHDI
#ifdef FEATURE_WXYZ

Upvotes: 0

Magoo
Magoo

Reputation: 80033

@ECHO OFF
SETLOCAL ENABLEDELAYEDEXPANSION
FOR /f "delims==" %%i IN ('set found 2^>nul') DO SET "%%i="
SET found=FEATURE_
SET /a count=0
(
FOR /f "delims=" %%i IN ('findstr /n "$" ^<feature.txt') DO (
 SET feature=%%i
 SET line=!feature:*:=!
 IF DEFINED line (
  SET feature=!line:*FEATURE_=!
  IF "!line!"=="!feature!" (ECHO(!line!) ELSE (
   FOR /f %%f IN ("!feature!") DO SET feature=%%f&SET found|FINDSTR /e "=%%f" >NUL
   IF ERRORLEVEL 1 (
    ECHO(!line!
    SET found!count!=!feature!
    SET /a count+=1
   ) 
  )
 ) ELSE (ECHO()
)
) >newfile.txt

for each line, including empty lines,

  • number the line, then strip the number Generate an empty line if original was empty
  • otherwise, see whether the line contains the target text, echo if not
  • otherwise, see whether the string after the target has aleady been found.
  • if not, generate the line and record the new target-suffix in foundcounter

BUT

Futher to Aacin's comment, perhaps you should sit down with a nice hot cup of tea and think about what you really want here.

If you do as you've said, then the sequence

#ifdef FEATURE_ABCD
something
endif

or

#ifdef FEATURE_ABCD something

would likely produce something you don't really want - and how about

#ifdef FEATURE_ABCD
...
#define FEATURE_ABCD
...
#ifdef FEATURE_ABCD

??

Upvotes: 1

Related Questions