yodish
yodish

Reputation: 763

Regex to match only when certain characters follow a string

I need to find a string that contains "script" with as many characters before or after, and enclosed in < and >. I can do this with:<*script.*>

I also want to match only when that string is NOT followed by a < The closest I've come, so far, is with this: (<*script.*>)([^=?<*]*)$

However, that will fail for something like <script></script> because the last > isn't followed by a < (so it doesn't match).

How can I check if only the the first > is followed by < or not?

For example, <script> abc () ; </script> MATCH

<< ScriPT >abc (”XXX”);//<</ ScriPT > MATCH

<script></script> DON'T MATCH

And, a case that I still am working on: <script/script> DON'T MATCH

Thanks!

Upvotes: 1

Views: 118

Answers (3)

SactoJosh
SactoJosh

Reputation: 376

You were close with your Regex. You just needed to make your first query non-greedy using a ? after the second *. Try this out:

(?i)<*\s*script.*?>[^<]+<*[^>]+>

There is an app called Expresso that really helps with designing Regex strings. Give it a shot.

Explanation: Without the ? non-greedy argument, your second * before the first > makes the search go all the way to the end of the string and grab the > at the end right at that point. None of the other stuff in your query was even being looked at.

EDIT: Added (?i) at the beginning for case-insensitivity. If you want a javascript specific case-insensitive regex, you would do that like this:

/<*\s*script.*?>[^<]+<*[^>]+>/i

I noticed you have parenthesis in your regex to make groups but you didn't specifically say you were trying to capture groups. Do you want to capture what's between the <script> and </script>? If so, that would be:

/<*\s*script.*?>([^<]+)<*[^>]+>/i

Upvotes: 2

sniperd
sniperd

Reputation: 5274

If I understand what you are looking for give this a try:

regex = "<\s*script\s*>([^<]+)<"

Here is an example in Python:

import re

textlist = ["<script>show this</script>","<script></script>"]

regex = "<\s*script\s*>([^<]+)"

for text in textlist:
    thematch = re.search(regex, text, re.IGNORECASE)
    if thematch:
        print ("match found:")
        print (thematch.group(1))
    else:
        print ("no match sir!")

Explanation: start with < then possible spaces, the word script, possible spaces, a > then capture all (at least 1) non < and make sure that's followed by a <

Hope that helps!

Upvotes: 1

Reed
Reed

Reputation: 1642

This would be better solved by using substring() and/or indexOf() JavaScript methods

Upvotes: -1

Related Questions