Reputation: 47
I had an HTML string that looks like:
<img src="blah blah blah"><p> blah blah
blah blah blah blah blah blah
blah blah blah</p>
How can i read the blah blah...
using regex?
I tried (.+?) but its not working, and searched google but didnt found a solution for Python.
Thanks!
Upvotes: 1
Views: 59
Reputation: 174726
You could try the below code also which uses (?s)
DOTALL modifier,
>>> s = """<img src="blah blah blah"><p> blah blah
... blah blah blah blah blah blah
... blah blah blah</p>"""
>>> import re
>>> m = re.search(r'(?s)(?<=<p>).*?(?=<\/p>)', s).group(0)
>>> print m
blah blah
blah blah blah blah blah blah
blah blah blah
Upvotes: 0
Reputation: 41838
With the usual disclaimers about using regex to parse html, this will work:
import re
match = re.search("<img[^>]*><p>([^<]*)</p>", subject)
if match:
blahblah = match.group(1)
print blahblah
Explanation
<img
matches literal chars[^>]*
matches any chars that are not >
><p>
matches literal chars([^<]*)
captures any chars that are not <
to Group 1 (this is what we want)</p>
matches literal charsmatch.group(1)
contains our stringUpvotes: 2
Reputation: 271
Give you one example for Java:
public static void testRegExp() {
try {
String input = "<img src=\"blah blah blah\"><p> blah blah" +
"\n blah blah blah blah blah blah" +
"\nblah blah blah</p>";
Pattern pMod = Pattern.compile("(blah\\s+)+");
Matcher mMod = pMod.matcher(input);
int beg = 0;
while (mMod.find()) {
System.out.println("--------------");
System.out.println(mMod.group(0));
}
} catch(Exception ex) {
ex.printStackTrace();
}
}
blah blah blah blah blah blah blah blah blah blah
For Python, I guess the regeular expression is similar. Good luck & have a try.
Upvotes: 0