Reputation: 37
I have the following script which runs commands on each file in a directory to match for a specific pattern. It then prints the matching output to a .csv. I have the desired formatting, however each pattern that I am matching on is getting printed twice. Like this:
Match1
Match2
Match1
Match2
Piping uniq and sort into this script is not fixing the problem so I suspect my syntax is off. I have not been able to find a solution via Google or other answers thus far. Any help is appreciated, thanks!
#!/usr/bin/env bash
FILES=/Users/User1/Desktop/Folder/"*"
for f in $FILES
do
echo "Processing $f file..."
# take action on each file. $f store current file name
sed -n /"New Filters"/,/"Modified Filters"/p "$f" | grep -v -e 'Bugtraq ID:'
-e 'Common Vulnerabilities and Exposures:' -e 'Android' | grep -E '(^|[^0-9])
[0-9]{5}($|[^0-9])'| sed 's/:/,/1' >> NewFile.csv
echo "Complete. Check NewFile.csv"
done;
Sample Input: Expected Result is to extract text in bold
Filters
New Filters
Modified Filters (logic changes)
Modified
Filters (metadata changes only)
Removed FiltersFilters
New Filters:
29722: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1Modified Filters (logic changes):
Text I don't wantModified Filters (metadata changes only):
Text I don't want
Upvotes: 0
Views: 2384
Reputation: 23804
if you need:
:
with ,
then you can try
perl -lne 'BEGIN{$/=undef} push @r,$& while /(?<=New Filters).*?(?=Modified Filters)/gs; @r2=grep(!/Bugtraq ID:|Common Vulnerabilities and Exposures:|Android/g,@r); /\d{5}[^\n]+\d/g && ($_=$&) && s/:/,/ && print for @r2' file
for this sample input file
dified Filters (logic changes)
Modified
Filters (metadata changes only)
Removed Filters
Filters
New Filters:
29722: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
Modified Filters (logic changes):
Text I don't want
Modified Filters (metadata changes only):
Text I don't want
New Filters:
Bugtraq ID:
Modified Filters (logic changes):
New Filters:
Common Vulnerabilities and Exposures:
Modified Filters (logic changes):
New Filters:
Android
Modified Filters (logic changes):
New Filters:
29723: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
Modified Filters (logic changes):
New Filters:
29724: HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
Modified Filters (logic changes):
output will be:
29722, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
29723, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
29724, HTTP: Dragonfly Backdoor.Goodor Go Implant CnC Beacon 1
Upvotes: 0
Reputation: 203645
We can't tell what your problem is without sample input/output so this isn't an answer to that, but here's how to really do what you're trying to do with that script:
awk '
FNR==1 { printf "Processing %s file...\n", FILENAME | "cat>&2" }
/"New Filters"/ { inBlock=1 }
inBlock {
if ( !/Bugtraq ID:|Common Vulnerabilities and Exposures:|Android/ &&
/(^|[^0-9])[0-9]{5}($|[^0-9])/ ) {
sub(/:/,",")
print
}
}
/"Modified Filters"/ { inBlock=0 }
' /Users/User1/Desktop/Folder/"*" > "NewFile.csv"
echo "Complete. Check NewFile.csv"
Note that there's no shell loop required. See why-is-using-a-shell-loop-to-process-text-considered-bad-practice.
Any time you find yourself using multiple commands (in particular multiple seds and/or greps) and pipes just to manipulate text, consider just using awk instead.
Upvotes: 2
Reputation: 361635
Are you running the script twice? It appends with >> NewFile.csv
without truncating the file at the beginning, so if run twice the CSV file would end up with repeated output. You can add > NewFile.csv
at the beginning to empty out the output file.
Or, perhaps you have duplicate input files.
Upvotes: 1