Jibmo
Jibmo

Reputation:

How do I split a string containing a math expression into a list?

How do I tokenize the string:

"2+24*48/32"

Into a list:

['2', '+', '24', '*', '48', '/', '32']

Upvotes: 36

Views: 76293

Answers (12)

Xinyue Zhang
Xinyue Zhang

Reputation: 1

Here is a good way that I always use when splitting str with different special characters. However, this code does not work with _, if there is a _ in the str you want to split, you might need to do another split one more time.

import re
  
  
# initializing string  
data = "2+24*48/32"
  
# printing original string  
print("The original string is : " + data) 
  
# Using re.findall() 
# Splitting characters in String 
res = re.findall(r"[\w']+", data)
  
# printing result  
print("The list after performing split functionality : " + str(res)) 

Upvotes: 0

Glyph
Glyph

Reputation: 31860

It just so happens that the tokens you want split are already Python tokens, so you can use the built-in tokenize module. It's almost a one-liner; this program:

from io import StringIO
from tokenize import generate_tokens

STRING = 1
print(
    list(
        token[STRING]
    for token in generate_tokens(StringIO("2+24*48/32").readline)
    if token[STRING]
    )
)

produces this output:

['2', '+', '24', '*', '48', '/', '32']

Upvotes: 51

readonly
readonly

Reputation: 355524

You can use split from the re module.

re.split(pattern, string, maxsplit=0, flags=0)

Split string by the occurrences of pattern. If capturing parentheses are used in pattern, then the text of all groups in the pattern are also returned as part of the resulting list.

Example code:

import re
data = re.split(r'(\D)', '2+24*48/32')

\D

When the UNICODE flag is not specified, \D matches any non-digit character; this is equivalent to the set [^0-9].

Upvotes: 36

Reputation:

i'm sure Tim meant

splitter = re.compile(r'([\D])'). 

if you copy exactly what he has down you only get the digits not the operators.

Upvotes: 0

jbchichoko
jbchichoko

Reputation: 1634

>>> import re
>>> my_string = "2+24*48/32"
>>> my_list = re.findall(r"-?\d+|\S", my_string)
>>> print my_list

['2', '+', '24', '*', '48', '/', '32']

This will do the trick. I have encountered this kind of problem before.

Upvotes: 1

Timotheos
Timotheos

Reputation: 405

This doesn't answer the question exactly, but I believe it solves what you're trying to achieve. I would add it as a comment, but I don't have permission to do so yet.

I personally would take advantage of Python's maths functionality directly with exec:

expression = "2+24*48/32"
exec "result = " + expression
print result
38

Upvotes: 0

Jerub
Jerub

Reputation: 42628

This looks like a parsing problem, and thus I am compelled to present a solution based on parsing techniques.

While it may seem that you want to 'split' this string, I think what you actually want to do is 'tokenize' it. Tokenization or lexxing is the compilation step before parsing. I have amended my original example in an edit to implement a proper recursive decent parser here. This is the easiest way to implement a parser by hand.

import re

patterns = [
    ('number', re.compile('\d+')),
    ('*', re.compile(r'\*')),
    ('/', re.compile(r'\/')),
    ('+', re.compile(r'\+')),
    ('-', re.compile(r'\-')),
]
whitespace = re.compile('\W+')

def tokenize(string):
    while string:

        # strip off whitespace
        m = whitespace.match(string)
        if m:
            string = string[m.end():]

        for tokentype, pattern in patterns:
            m = pattern.match(string)
            if m:
                yield tokentype, m.group(0)
                string = string[m.end():]

def parseNumber(tokens):
    tokentype, literal = tokens.pop(0)
    assert tokentype == 'number'
    return int(literal)

def parseMultiplication(tokens):
    product = parseNumber(tokens)
    while tokens and tokens[0][0] in ('*', '/'):
        tokentype, literal = tokens.pop(0)
        if tokentype == '*':
            product *= parseNumber(tokens)
        elif tokentype == '/':
            product /= parseNumber(tokens)
        else:
            raise ValueError("Parse Error, unexpected %s %s" % (tokentype, literal))

    return product

def parseAddition(tokens):
    total = parseMultiplication(tokens)
    while tokens and tokens[0][0] in ('+', '-'):
        tokentype, literal = tokens.pop(0)
        if tokentype == '+':
            total += parseMultiplication(tokens)
        elif tokentype == '-':
            total -= parseMultiplication(tokens)
        else:
            raise ValueError("Parse Error, unexpected %s %s" % (tokentype, literal))

    return total

def parse(tokens):
    tokenlist = list(tokens)
    returnvalue = parseAddition(tokenlist)
    if tokenlist:
        print 'Unconsumed data', tokenlist
    return returnvalue

def main():
    string = '2+24*48/32'
    for tokentype, literal in tokenize(string):
        print tokentype, literal

    print parse(tokenize(string))

if __name__ == '__main__':
    main()

Implementation of handling of brackets is left as an exercise for the reader. This example will correctly do multiplication before addition.

Upvotes: 18

molasses
molasses

Reputation: 3318

>>> import re
>>> re.findall(r'\d+|\D+', '2+24*48/32=10')

['2', '+', '24', '*', '48', '/', '32', '=', '10']

Matches consecutive digits or consecutive non-digits.

Each match is returned as a new element in the list.

Depending on the usage, you may need to alter the regular expression. Such as if you need to match numbers with a decimal point.

>>> re.findall(r'[0-9\.]+|[^0-9\.]+', '2+24*48/32=10.1')

['2', '+', '24', '*', '48', '/', '32', '=', '10.1']

Upvotes: 18

Ber
Ber

Reputation: 41813

This is a parsing problem, so neither regex not split() are the "good" solution. Use a parser generator instead.

I would look closely at pyparsing. There have also been some decent articles about pyparsing in the Python Magazine.

Upvotes: 6

habnabit
habnabit

Reputation: 10274

Another solution to this would be to avoid writing a calculator like that altogether. Writing an RPN parser is much simpler, and doesn't have any of the ambiguity inherent in writing math with infix notation.

import operator, math
calc_operands = {
    '+': (2, operator.add),
    '-': (2, operator.sub),
    '*': (2, operator.mul),
    '/': (2, operator.truediv),
    '//': (2, operator.div),
    '%': (2, operator.mod),
    '^': (2, operator.pow),
    '**': (2, math.pow),
    'abs': (1, operator.abs),
    'ceil': (1, math.ceil),
    'floor': (1, math.floor),
    'round': (2, round),
    'trunc': (1, int),
    'log': (2, math.log),
    'ln': (1, math.log),
    'pi': (0, lambda: math.pi),
    'e': (0, lambda: math.e),
}

def calculate(inp):
    stack = []
    for tok in inp.split():
        if tok in self.calc_operands:
            n_pops, func = self.calc_operands[tok]
            args = [stack.pop() for x in xrange(n_pops)]
            args.reverse()
            stack.append(func(*args))
        elif '.' in tok:
            stack.append(float(tok))
        else:
            stack.append(int(tok))
    if not stack:
        raise ValueError('no items on the stack.')
    return stack.pop()
    if stack:
        raise ValueError('%d item(s) left on the stack.' % len(stack))

calculate('24 38 * 32 / 2 +')

Upvotes: 4

Jiayao Yu
Jiayao Yu

Reputation: 818

s = "2+24*48/32"

p = re.compile(r'(\W+)')

p.split(s)

Upvotes: 5

Cristian
Cristian

Reputation: 43967

Regular expressions:

>>> import re
>>> splitter = re.compile(r'([+*/])')
>>> splitter.split("2+24*48/32")

You can expand the regular expression to include any other characters you want to split on.

Upvotes: 4

Related Questions