Reputation: 533
I am planning to move one of my scrapers to Python. I am comfortable using preg_match
and preg_match_all
in PHP. I am not finding a suitable function in Python similar to preg_match
. Could anyone please help me in doing so?
For example, if I want to get the content between <a class="title"
and </a>
, I use the following function in PHP:
preg_match_all('/a class="title"(.*?)<\/a>/si',$input,$output);
Whereas in Python I am not able to figure out a similar function.
Upvotes: 16
Views: 27640
Reputation: 535
I think you need somthing like that:
output = re.search('a class="title"(.*?)<\/a>', input, flags=re.IGNORECASE)
if output is not None:
output = output.group(0)
print(output)
you can add (?s) at the start of regex to enable multiline mode:
output = re.search('(?s)a class="title"(.*?)<\/a>', input, flags=re.IGNORECASE)
if output is not None:
output = output.group(0)
print(output)
Upvotes: 5
Reputation: 49567
You looking for python's re module.
Take a look at re.findall and re.search.
And as you have mentioned you are trying to parse html use html parsers
for that. There are a couple of option available in python like lxml or BeautifulSoup.
Take a look at this Why you should not parse html with regex
Upvotes: 14
Reputation: 26861
You might be interested in reading about Python Regular Expression Operations
Upvotes: 2