Reputation: 15
Here's the problem: i have ~35k files that might or might not contain one or more of the strings in a list of 300 lines containing a regex each
if I grep -rnwl 'C:\out\' --include=*.txt -E --file='comp.log'
i see there are a few thousands of files that contain a match.
now how do i get sed to delete each line in these files containing the strings in comp.log used before?
edit: comp.log contains a simple regex in each line, but for the most part each string to be matched is unique
this is is an example of how it is structured:
server[0-9]\/files\/bobba fett.stw
[a-z]+ mochaccino
[2-9] CheeseCakes
...
etc. silly examples aside, it goes to show each line is unique save for a few variations so it shouldn't affect what i really want: see if any of these lines match the lines in the file being worked on. it's no different than 's/pattern/replacement/' except that i want to use the patterns in the file instead of inline.
Ok here's an update (S.O. gets inpatient if i don't declare the question answered after a few days) after MUCH fiddling with the @Kenavoz/@Fischer approach, i found a totally different solution, but first things first. creating a modified pattern list for sed to work with does work.
as well as @werkritter's approach of dropping sed altogether. (this one i find the most... err... "least convoluted" way around the problem).
I couldn't make @Mklement's answer work under windows/cygwin (it did work on under ubuntu, so...not sure what that means. figures.)
What ended up solving the problem in a more... long term, reusable form was a wonderful program pointed out by a colleage called PowerGrep. it really blows every other option out of the water. unfortunately it's windows only AND it's not free. (not even advertising here, the thing is not cheap, but it does solve the problem).
so considering @werkiter's reply was not a "proper" answer and i can't just choose both @Lars Fischer and @Kenavoz's answer as a solution (they complement each other), i am awarding @Kenavoz the tickmark for being first.
final thoughts: i was hoping for a simpler, universal and free solution but apparently there is not.
Upvotes: 1
Views: 746
Reputation: 440677
Both Kenavoz's answer and Lars Fischer's answer use the same ingenious approach:
transform the list of input regexes into a list of sed
match-and-delete commands, passed as a file acting as the script to sed
via -f
.
To complement these answers with a single command that puts it all together, assuming you have GNU sed
and your shell is bash
, ksh
, or zsh
(to support <(...)
):
find 'c:/out' -name '*.txt' -exec sed -i -r -f <(sed 's#.*#/\\<&\\>/d#' comp.log) {} +
find 'c:/out' -name '*.txt'
matches all *.txt
files in the subtree of dir. c:/out
-exec ... +
passes as many matching files as will fit on a single command line to the specified command, typically resulting only in a single invocation.sed -i
updates the input files in-place (conceptually speaking - there are caveats); append a suffix (e.g., -i.bak
) to save backups of the original files with that suffix.
sed -r
activates support for extended regular expressions, which is what the input regexes are.
sed -f
reads the script to execute from the specified filename, which in this case, as explained in Kenavoz's answer, uses a process substitution (<(...)
) to make the enclosed sed
command's output act like a [transient] file.
s///
sed
command - which uses alternative delimiter #
to facilitate use of literal /
- encloses each line from comp.log
in /\<...\>/d
to yield the desired deletion command; the enclosing of the input regex in \<...\>
ensures matching as a word, as grep -w
does.sed
is required, because neither POSIX EREs (extended regular expressions) nor BSD/OSX sed
support \<
and \>
.
sed
by replacing -r
with -E
, and \<
/ \>
with [[:<:]]
/ [[:>:]]
Upvotes: 0
Reputation: 15481
You can try this :
sed -f <(sed 's/^/\//g;s/$/\/d/g' comp.log) file > outputfile
All regex in comp.log
are formatted to a sed address with a d
command : /regex/d
. This command deletes lines matching the patterns.
This internal sed is sent as a file (with process substitition) to the -f
option of the external sed applied to file
.
To delete just string matching the patterns (not all line) :
sed -f <(sed 's/^/s\//g;s/$/\/\/g/g' comp.log) file > outputfile
Update :
The command output is redirected to outputfile
.
Upvotes: 2
Reputation: 10229
Some ideas but not a complete solution, as it requires some adopting to your script (not shown in the question).
I would convert comp.log into a sed script containing the necessary deletes:
cat comp.log | sed -r "s+(.*)+/\1/ d;+" > comp.sed`
That would make your example comp.sed look like:
/server[0-9]\/files\/bobba fett.stw/ d;
/[a-z]+ mochaccino/ d;
/[2-9] CheeseCakes/ d;
then I would apply the comp.sed
script to each file reported by grep (With your -rnwl
that would require some filtering to get the filename.):
sed -i.bak -f comp.sed $AFileReportedByGrep
If you have gnu sed, you can use -i
inplace replacement creating a .bak backup, otherwise use piping to a temporary file
Upvotes: 2