Nick
Nick

Reputation: 37

Bash Script Printing Output Twice

I have the following script which runs commands on each file in a directory to match for a specific pattern. It then prints the matching output to a .csv. I have the desired formatting, however each pattern that I am matching on is getting printed twice. Like this:

Match1
Match2
Match1
Match2

Piping uniq and sort into this script is not fixing the problem so I suspect my syntax is off. I have not been able to find a solution via Google or other answers thus far. Any help is appreciated, thanks!

#!/usr/bin/env bash
FILES=/Users/User1/Desktop/Folder/"*"
for f in $FILES
do
  echo "Processing $f file..."
  # take action on each file. $f store current file name

    sed -n /"New Filters"/,/"Modified Filters"/p "$f" | grep -v -e 'Bugtraq ID:' 
  -e 'Common Vulnerabilities and Exposures:' -e 'Android' | grep -E '(^|[^0-9]) 
  [0-9]{5}($|[^0-9])'| sed 's/:/,/1' >> NewFile.csv

   echo "Complete. Check NewFile.csv"
 done;

Sample Input: Expected Result is to extract text in bold

Filters
New Filters
Modified Filters (logic changes)
Modified
Filters (metadata changes only)
Removed Filters

Filters
New Filters:
29722: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1

Modified Filters (logic changes):
Text I don't want

Modified Filters (metadata changes only):
Text I don't want

Upvotes: 0

Views: 2384

Answers (3)

Shakiba Moshiri
Shakiba Moshiri

Reputation: 23804

if you need:

  • extract anything between
    • New Filter ... Modified Filters
  • but exclude
    • Bugtraq ID:
    • Common Vulnerabilities and Exposures:
    • Android
  • also match
    • 5 digits up to 1 digit at the end
  • plus
    • replace the first : with ,

then you can try

perl -lne 'BEGIN{$/=undef} push @r,$& while /(?<=New Filters).*?(?=Modified Filters)/gs; @r2=grep(!/Bugtraq ID:|Common Vulnerabilities and Exposures:|Android/g,@r); /\d{5}[^\n]+\d/g && ($_=$&) && s/:/,/ && print for @r2' file  

for this sample input file

dified Filters (logic changes)   
Modified  
Filters (metadata changes only)   
Removed Filters  

Filters     
New Filters:  
29722: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1  

Modified Filters (logic changes):   
Text I don't want  

Modified Filters (metadata changes only):   
Text I don't want  


New Filters:  
Bugtraq ID:

Modified Filters (logic changes):   


New Filters:  
Common Vulnerabilities and Exposures:


Modified Filters (logic changes):   


New Filters:  
Android
Modified Filters (logic changes):   


New Filters:  

29723: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1  
Modified Filters (logic changes):   


New Filters:  

29724: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1  

Modified Filters (logic changes):   

output will be:

29722, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
29723, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
29724, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1

Upvotes: 0

Ed Morton
Ed Morton

Reputation: 203645

We can't tell what your problem is without sample input/output so this isn't an answer to that, but here's how to really do what you're trying to do with that script:

awk '
FNR==1 { printf "Processing %s file...\n", FILENAME | "cat>&2" }
/"New Filters"/ { inBlock=1 }
inBlock {
    if ( !/Bugtraq ID:|Common Vulnerabilities and Exposures:|Android/ &&
             /(^|[^0-9])[0-9]{5}($|[^0-9])/ ) {
        sub(/:/,",")
        print
    }
}
/"Modified Filters"/ { inBlock=0 }
' /Users/User1/Desktop/Folder/"*" > "NewFile.csv"
echo "Complete. Check NewFile.csv"

Note that there's no shell loop required. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice.

Any time you find yourself using multiple commands (in particular multiple seds and/or greps) and pipes just to manipulate text, consider just using awk instead.

Upvotes: 2

John Kugelman
John Kugelman

Reputation: 361635

Are you running the script twice? It appends with >> NewFile.csv without truncating the file at the beginning, so if run twice the CSV file would end up with repeated output. You can add > NewFile.csv at the beginning to empty out the output file.

Or, perhaps you have duplicate input files.

Upvotes: 1

Related Questions