Reputation: 33

sre_constants.error: nothing to repeat in jython

I have html content I want to get comment from this content

content = """<html>
<body>
<!--<h1>test</h1>-->
<!--<div>
    <img src='x'>
  </div>-->

Blockquote

<!--
  <div>
    <img src='xe'>
  </div>
-->
</body>
</html>"""

i use this regex

regex_str = "<!--((\n|\r)+)?((.*?)+((\n|\r)+)?)+-->"

When running this line in Python

re.findall(regex_str,content)

It works successfully

but when running in jython This error appears

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/share/java/jython-2.7.2.jar/Lib/re$py.class", line 177, in findall
  File "/usr/share/java/jython-2.7.2.jar/Lib/re$py.class", line 242, in _compile
sre_constants.error: nothing to repeat

Upvotes: 3

Answers (1)

Ryszard Czech

Reputation: 18641

Use

<!--[\n\r]*([\w\W]*?)[\n\r]*-->

See regex proof.

EXPLANATION

--------------------------------------------------------------------------------
  <!--                     '<!--'
--------------------------------------------------------------------------------
  [\n\r]*                  any character of: '\n' (newline), '\r'
                           (carriage return) (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  (                        group and capture to \1:
--------------------------------------------------------------------------------
    [\w\W]*?                 any character of: word characters (a-z,
                             A-Z, 0-9, _), non-word characters (all
                             but a-z, A-Z, 0-9, _) (0 or more times
                             (matching the least amount possible))
--------------------------------------------------------------------------------
  )                        end of \1
--------------------------------------------------------------------------------
  [\n\r]*                  any character of: '\n' (newline), '\r'
                           (carriage return) (0 or more times
                           (matching the most amount possible))
--------------------------------------------------------------------------------
  -->                      '-->'

Upvotes: 2

sre_constants.error: nothing to repeat in jython

Answers (1)

Related Questions