Robert Johnstone
Robert Johnstone

Reputation: 5371

Extracting text using regex in Python

How do I extract roundUp(...) using regex (or some other derivative) from the following possible permutations:

[[[ roundUp( 10.0 ) ]]]
[[[ roundUp( 10.0 + 2.0 ) ]]]
[[[ roundUp( (10.0 * 2.0) + 2.0 ) ]]]
[[[ 10.0 + roundUp( (10.0 * 2.0) + 2.0 ) ]]]
[[[ 10.0 + roundUp( (10.0 * 2.0) + 2.0 ) + 20.0 ]]]

The reason I'm asking is I would like to replace roundUp(...) with math.ceil((...)*100)/100.0 in my code but I'm not to sure how to do it because of the chance brackets being used multiple times to help with operator precedence

Upvotes: 1

Views: 97

Answers (2)

Mu Mind
Mu Mind

Reputation: 11214

You can't solve the general case with regular expressions. Regular expressions are not powerful enough to represent anything analogous to a stack, such as parentheses or XML tags nested to arbitrary depth.

If you are solving the problem in python, you can do something like

import re

def roundup_sub(m):
    close_paren_index = None
    level = 1
    for i, c in enumerate(m.group(1)):
        if c == ')':
            level -= 1
        if level == 0:
            close_paren_index = i
            break
        if c == '(':
            level += 1
    if close_paren_index is None:
        raise ValueError("Unclosed roundUp()")
    return 'math.ceil((' + m.group(1)[1:close_paren_index] + ')*100)/100.0' + \
            m.group(1)[close_paren_index:]    # matching ')' and everything after

def replace_every_roundup(text):
    while True:
        new_text = re.sub(r'(?ms)roundUp\((.*)', roundup_sub, text)
        if new_text == text:
            return text
        text = new_text

This uses the repl=function form of re.sub, and uses a regex to find the beginning and python to match the parentheses and decide where to end the substitution.


An example of using them:

my_text = """[[[ roundUp( 10.0 ) ]]]
[[[ roundUp( 10.0 + 2.0 ) ]]]
[[[ roundUp( (10.0 * 2.0) + 2.0 ) ]]]
[[[ 10.0 + roundUp( (10.0 * 2.0) + 2.0 ) ]]]
[[[ 10.0 + roundUp( (10.0 * 2.0) + 2.0 ) + 20.0 ]]]"""
print replace_every_roundup(my_text)

which gives you the output

[[[ math.ceil((10.0 )*100)/100.0) ]]]
[[[ math.ceil((10.0 + 2.0 )*100)/100.0) ]]]
[[[ math.ceil(((10.0 * 2.0) + 2.0 )*100)/100.0) ]]]
[[[ 10.0 + math.ceil(((10.0 * 2.0) + 2.0 )*100)/100.0) ]]]
[[[ 10.0 + math.ceil(((10.0 * 2.0) + 2.0 )*100)/100.0) + 20.0 ]]]

Another option would be to implement a regex that handles up to a certain depth of nested parentheses.

Upvotes: 1

wim
wim

Reputation: 363405

This is python, why don't you just rebind the name roundUp:

def my_roundup(x):
  return math.ceil(x*100)/100.

roundUp = my_roundup

Upvotes: 5

Related Questions