ColinTyphoon81
ColinTyphoon81

Reputation: 1

Going backwards in regex Python

I have been trying at this all day, and can't find a solution. Here is my current code:

stranger = re.search(r"Stranger:</strong> <span>.+?</span></p></div></div></div>", html2)

I am wanting an outcome like this:

"Stranger:</strong> <span>What now?</span></p></div></div></div>" = True

from a string like this:

"<div class=\"logitem\"><p class=\"strangermsg\"><strong class=\"msgsource\">Stranger:</strong> <span>Wow</span></p></div><div class=\"logitem\"><p class=\"youmsg\"><strong class="msgsource">You:</strong> <span>Eek</span></p></div><div class=\"logitem\"><p class=\"strangermsg\"><strong class=\"msgsource\">Stranger:</strong> <span>What now?</span></p></div></div></div>"

Instead I get this:

"Stranger:</strong> <span>Wow</span></p></div><div class=\"logitem\"><p class=\"youmsg\"><strong class=\"msgsource\">You:</strong> <span>Eek</span></p></div><div class=\"logitem\"><p class=\"strangermsg\"><strong class=\"msgsource\">Stranger:</strong> <span>What now?</span></p></div></div></div>" = True

Basically I am wanting to get everything from before the "/span p div div div" and after the previous instance of "span" (no /). I've tried all kinds of things, but I don't know what I could possibly do. Anyone able to help here?

Upvotes: 0

Views: 50

Answers (1)

Alexander Wu
Alexander Wu

Reputation: 483

Try specifying that between the two inner tags, don't allow special control sequences. For example,

stranger = re.search(r"Stranger:</strong> <span>[^<>]+?</span></p></div></div></div>", html2)

This means that whatever is between those two inner tags, there cannot be other < or > characters.

Upvotes: 1

Related Questions