icebox19
icebox19

Reputation: 523

How to get multiple regex matches in python?

I have this text:

 <div class="additional-details">
  <div class="mark-container">
   <input type="checkbox" id="comp-80174649" value="80174649"
          data-heading-code="2550"/>
   <label for="comp-80174649">???</label>
   <a href="#" class="compare-link" id="compare-link-1"
      data-compare="/80174649/2550/"
      data-drop-down-id="compare-content-1"
      data-drop-down-content-id="compare-content"
      data-drop-down-class="drop-down-compare"
      etc...
      data-compare="/8131239/2550/"

I am trying to scrape what is inside data-compare="HERE" (I have multiple matches).

I know how to do this in C#, using a MatchCollection, but in python I am pretty confused with re.search, re.match and also I've noticed that the regex that is working in C# is not really working in python.

Could somebody explain how to get this done ?

Upvotes: 2

Views: 681

Answers (1)

vivekagr
vivekagr

Reputation: 1836

re.findall can be used to find all the matches in a list.

>>> import re
>>> s = '<div cla'  # whole string here
>>> result = re.findall('data-compare="([\d/]+)"', s)
>>> print result
['/80174649/2550/', '/8131239/2550/']

Explanation

The desired output like '/80174649/2550/' has only numbers and forward slash, so we'll be only targeting that.

In ([\d/]+), [\d/] means match either a number (signified by \d) or forward slash /.

Then the + symbol means that the preceding pattern [\d/] can occur multiple times since we do have multiple numbers and /.

The enclosing parentheses means that the enclosed pattern [\d/]+ should only be captured and returned.

Upvotes: 1

Related Questions