Sequoya
Sequoya

Reputation: 1425

Python regex [group separator[

This is the output I have: ['5', '+', '4X1', '-', '9.3X2']

The output I want is : ['5', '+4X1', '-9.3X2']

How can I achieve that?

import re
import sys

class bcolors:
    HEADER = '\033[95m'
    OKBLUE = '\033[94m'
    OKGREEN = '\033[92m'
    WARNING = '\033[93m'
    FAIL = '\033[91m'
    ENDC = '\033[0m'
    BOLD = '\033[1m'
    UNDERLINE = '\033[4m'


 def parse(str):
    for ch in[' ', '^', '*', 'X0']:
        if ch in str:
            str = str.replace(ch, '')
    str = str.split(('='))
    left = str[0]
    right = str[1]
    left = re.split("(\+|\-)", left)
    print left

if __name__ == '__main__':
    if len(sys.argv) == 2:
        parse(sys.argv[1])
    else:
        print ("please enter your string in one argument in this form: \n\t"
        + bcolors.OKGREEN + "5 * X^0 + 4 * X^1 - 9.3 * X^2 = 1 * X^0" + bcolors.ENDC)

Thank you for help of any kind !

Upvotes: 0

Views: 919

Answers (3)

vks
vks

Reputation: 67968

You need to split on 0 width assertion which is not supported by re but regex module.

x="5+4X1-9.3X2"
print regex.split(r"(?=[+-])",x,flags=regex.VERSION1)

Output:['5', '+4X1', '-9.3X2']

Upvotes: 0

Blckknght
Blckknght

Reputation: 104722

Your immediate issue is that the line left = re.split("(\+|\-)", left) includes the + or - symbols in the output as separate items, while you want the symbol to be combined with the number that follows it.

A possible solution is to use re.findall rather than re.split, and not use any capturing groups:

left = re.findall(r'(?:^|[+-])\d+(?:\.\d?)X\d+', left)

Another alternative would be to get rid of the string replacement calls you're currently doing and just use re.findall directly on the input string from the user. You'd have to reassemble the result tuples into a string (assuming you really need that at all), but that's easy to do with str.join:

def parse(s):
    pattern = r'(^|[+-])\s*(\d+(?:\.\d?))\s*\*\s*(X)\^(\d+)'
    return ["".join(x) for x in re.findall(pattern, s)]

Upvotes: 0

Cyphase
Cyphase

Reputation: 12022

This works for your example:

def clean_data(data):
    data_iter = iter(data)
    for item in data_iter:
        if item in {'+', '-'}:
            yield item + next(data_iter)
        else:
            yield item


data = ['5', '+', '4X1', '-', '9.3X2']
new_data = clean_data(data)

print(list(new_data))  # ['5', '+4X1', '-9.3X2']

Upvotes: 1

Related Questions