I'm trying to remove the following string/line in my SQL database: The quick brown fox jumps. String will always start with and end with String will always contain these words, in the same order: The , quick , brown . But they might be separated by something else (space, or other HTML tags) String is part of field with more text, nested HTML tags, so the solution must ignore higher level tags. We are talking about +20k matches, no manual edits solutions please :) I have already tried doing it with RegExp but I can't filter for multiple keywords ( AND operator). I can export my DB to a sql file so I can use any solution you would recommend, Windows/Linux, text editor, js script etc. but I would appreciate the simplest and elegant solution.

Reputation: 366

Find & replace multiple keywords defined string

I'm trying to remove the following string/line in my SQL database:

<p><span style="font-size:16px"><strong>The quick brown &nbsp;</strong></span><strong><span style="font-size:16px">fox jumps.</span></strong></p>

String will always start with  and end with 
String will always contain these words, in the same order: The, quick, brown. But they might be separated by something else (space,   or other HTML tags)
String is part of field with more text, nested HTML tags, so the solution must ignore higher level  tags.
We are talking about +20k matches, no manual edits solutions please :)

I have already tried doing it with RegExp but I can't filter for multiple keywords (AND operator).

I can export my DB to a sql file so I can use any solution you would recommend, Windows/Linux, text editor, js script etc. but I would appreciate the simplest and elegant solution.

Upvotes: 1

Answers (3)

Wiktor Stribiżew

Reputation: 627087

I think you have to restrict .* by a non-efficient but more precise (?:(?!<\/?p[^<]*>).)* that will force to match the words inside 1  tag:

(?i)<p>(?:(?!<\/?p[^<]*>).)*the(?:(?!<\/?p[^<]*>).)*?quick(?:(?!<\/?p[^<]*>).)*?brown(?:(?!<\/?p[^<]*>).)*?<\/p>

See demo

Upvotes: 1

karthik manchala

Reputation: 13640

You can use the following in any editor (say notepad++) or javascript or any PCRE engine with g, m, i modifiers to match:

^<p>.*?the.*?quick.*?brown.*?<\/p>$

Used .* instead of .+ because of your statement they MIGHT be separated by something else

and replace with '' (empty string)

Upvotes: 0

Yogesh_D

Reputation: 18809

This expression ^.*The.*quick.*brown.*\$ worked for me:

 [root@fedora ~]# grep "^<p>.*The.*quick.*brown.*</p>\$" test1.txt
<p><span style="font-size:16px"><strong>The quick brown &nbsp;</strong></span><strong><span style="font-size:16px">fox jumps.</span></strong></p>
<p><strong>The quick brown &nbsp;</strong></span><strong><span style="font-size:16px">fox jumps.</span></strong></p>
<p>The quick brown &nbsp;</strong></span><strong><span style="font-size:16px">fox jumps.</p>
[root@fedora ~]#

Upvotes: 0

Find &amp; replace multiple keywords defined string

Answers (3)

Related Questions

Find & replace multiple keywords defined string