Reputation: 14008
I need a regular expression to match anything that is within <p>
tags so for example if I had some text:
<p>Hello world</p>
The regex would match the Hello world part
Upvotes: 6
Views: 32079
Reputation: 130
For anybody looking into this Regex or any other regex to match specific HTML tags, this Regex below will work as needed:
<\s*p[^>]*>(.*?)<\s*\/\s*p\s*>
This will match strings like the below strings as mentioned in xzyfer's answer:
<p>I would like <b>all</b> the text!</p> < p style= "font-weight: bold;" >Hello world < / p >
Link to the Regex on Regex101 here: https://regex101.com/r/kjpLII
If you would like to use the Regex for other HTML tags instead of just p
tags you can change the p
's in the Regex to whichever HTML tag you wish to match:
<\s*div[^>]*>(.*?)<\s*\/\s*div\s*>
Upvotes: 0
Reputation: 121
You can use this in Python as a comprehensive solution:
import re
import bs4
import requests
page = requests.get(link)
page_content = bs4.BeautifulSoup(page.content,'html.parser')
result = page_content.find_all('p')
Upvotes: 2
Reputation: 71
It seems that the above proposed solutions will fail either:
<p>...</p>
tags whenever it contains other tags like <a>
, <em>
, etc.
or<p>
and <path>
or<p class="content">
Consider using this regex:
<p(|\s+[^>]*)>(.*?)<\/p\s*>
Resulting text will be captured in group 2.
Obviously, this solution won't work properly whenever closing tag </p>
will be for some reason enclosed in comment tags <p> ... <!-- ... </p> ... -->
Upvotes: 7
Reputation: 14135
in javascript:
var str = "<p>Hello world</p>";
str.search(/<\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/)
in php:
$str = "<p>Hello world</p>";
preg_match_all("/<\s*p[^>]*>([^<]*)<\s*\/\s*p\s*>/", $str);
These will match something as complex as this
< p style= "font-weight: bold;" >Hello world < / p >
Upvotes: 11
Reputation: 39628
EDIT: Don't do it. Just don't.
See this question
If you insist, use <p>(.+?)</p>
and the result will be in the first group. It is not perfect, but no regexp solution to HTML parsing problem will ever be.
E.g (in python)
>>> import re
>>> r = re.compile('<p>(.+?)</p>')
>>> r.findall("<p>fo o</p><p>ba adr</p>")
['fo o', 'ba adr']
Upvotes: 7
Reputation: 274878
Regex:
<([a-z][a-z0-9]*)\b[^>]*>(.*?)</\1>
This will work for any pair of tags.
e.g <p class="foo">hello<br/></p>
The \1 makes sure that the opening tag matches the closing tag.
The content between the tags is captured in \2.
Upvotes: 1