Reputation: 19

Delete multiple matching patterns from a file

Lets say I have a file containing

abc aab100 100 cdc 20aaab aaaan
gshgds aab122 ghsgsd cdc aajksj aaasdan
gsgdsg hqusu jsdjsd jksjks jskdk
hjshj aab1jk uiuasu cdc 100ai bbcbxb
arta hyiosa jkulp nnnnnak cdc

I want to match two pattern and if both pattern exist, I want to delete that line.

So here my pattern that I want to delete are aab1 and cdc

In the above file, both of these pattern matches in line 1, Line 2 and line 4. So I would like to delete these 3 lines

I can get the results that I want by using grep as

grep -v 'aab1.*cdc' test.txt > test1.txt

Even I can do this on the same file without writing to a new file by

echo "$(grep -v 'aab1.*cdc' test.txt)" > test.txt

But Is there any other better/faster/efficient way of doing this without using grep?

Thanks

Upvotes: 1

Answers (5)

sjsam

Reputation: 21965

sed -n  '/aab1.*cdc/!p' test > test1

should also do it

In short

We check for the pattern aab1.*cdc in each LINE and if it is present we don't print the line. The standard OUT is redirected to a file named test1

Notes

-n for suppressing the normal output
/pattern/ is for pattern matching.
p is for printing lines. So !p negates it.This overrides -n.

Using perl

perl -ni -e 'print unless (m/aab1/ && m/cdc/)' file

Notes

-n for suppressing the normal output
-i for in-place edit
-e allows you to define Perl code to be executed by the compiler.
m/aab1/ && m/cdc/ matches(m) both(&&) the patterns(/stuff/).
print unless takes care that a line is printed if both patters are not present.

Another sed solution based on [ @tripleee's ] answer

sed -n  '/aab1/!{/cdc/!p}' test > test1

Notes

This implements the branching as mentioned in this awk [ solution ].

Upvotes: 1

karakfa

Reputation: 67557

if the order of the patterns is specified

$ awk '!/aab1.*cdc/' file

if any order is possible

$ awk '!(/aab1/ && /cdc/)' file

Upvotes: 1

tripleee

Reputation: 189820

Awk to the rescue.

awk '!/aab1/ || !/cdc/' file

If aab1 is not matched or cdc is not matched, (perform the default action, which is to) print the line.

This extends nicely to scenarios where you don't care about the order of the matches, which gets complex quickly if you are constrained to a single regex.

The same in sed:

sed -e '/aab1/!b' -e '/cdc/d' file

Generalizing to more than two patterns, if there is a mismatch on a pattern, skip the rest of this script for this line. If we reach the final regex, we matched all the patterns, so we delete this line. (Otherwise, we print.)

Upvotes: 0

Andreas Louv

Reputation: 47137

Using grep for such task is fine, the main issue with your code is the command substitution as it will load the whole output from grep into memory. Consider using a temporary file:

grep -v 'aab1.*cdc' test.txt > tmp.txt && mv tmp.txt test.txt

Alternative you can used sed with -i which enables inplace editing (Under the hood sed will use a temporary file as well):

sed -i '/aab1.*cdc/d' test.txt

There is also sponge from moreutils which will soaks up all it's input before opening the output file:

grep -v 'aab1.*cdc' test.txt | sponge test.txt

I can't tell you how it's implemented though (Using temporary file or kept in memory)

Upvotes: 2

SLePort

Reputation: 15461

With sed :

sed -i '/aab1.*cdc/d' file

The -i option is for editing the file in place.

Upvotes: 1

Delete multiple matching patterns from a file

Answers (5)

Related Questions