Reputation: 371
I have seen questions similar to this, yet none that address this particular issue. I have a calculator expression using +, -, *, or / operators, and I want to standardize it so that anything someone enters will be homogenous with how my program wants it...
My program wants a string of the format " 10 - 7 * 5 / 2 + 3 ", with the spaces before and after, and in-between each value. I want to take anything someone enters such as "10-7*5/2+3" or " 10- 7*5/2 + 3 ", and make it into the first format I specified.
My first idea was to convert the string to a list, then join with spaces in-between and concatenate the spaces on the front and end, but the clear problem with that is that the '10' gets split into '1' and '0' and comes out as '1 0' after joining.
s = s.replace(" ", "")
if s[0] == "-":
s = "0" + s
else:
s = s
s = " " + " ".join(list(s)) + " "
I was thinking maybe doing something with RegEx might help, but I'm not entire sure how to put that into action. The main slip up for me mentally is getting the '10' and other higher order numbers not to split apart into their constituents when I do this.
I'm in python 3.5.
Upvotes: 2
Views: 289
Reputation: 4776
One idea if you're only ever dealing with very simple calculator expressions (i.e. digits and operands). If you also have other possible elements, you'd just have to adjust the regex.
Use a regex to extract the relevant pieces, ignoring whitespace, and then re-compose them together using a join.
def compose(expr):
elems = re.findall(r'(\d+|[\+,\-,\*,/])', expr) # a group consists of a digit sequence OR an operand
return ' ' + ' '.join(elems) + ' ' # puts a single space between all groups and one before and after
compose('10- 7*5/2 + 3')
# ' 10 - 7 * 5 / 2 + 3 '
compose('10-7*5/2+3')
# ' 10 - 7 * 5 / 2 + 3 '
The meat of the re.findall
call is the regular expression: r'(\d+|[\+,\-,\*,/])'
The first bit: \d
means match one digit. +
means match one or more of the preceding expression. So together \d+
means match one or more digits in a row.
The second bit: [...]
is the character-set notation. It means match one of any of the characters in the set. Now +
, -
, *
are all special regex chars, so you have to escape them with a backslash. Forward slash is not special, so it does not require an escape. So [\+,\-,\*,/]
means match one of any of +, -, *, /.
The |
in between the two regexes is your standard OR
operator. So match either the first expression OR the second one. And parenthesis are group notation in regexes, indicating what is the part of the regex you actually want to be returned.
Upvotes: 2
Reputation: 8224
Just like @fukanchik suggested, this is usually done in reverse, as in breaking the input string down into its basic components, and then re-assembling it again as you like.
I'd say you are on the right track using RegEx, as it's perfect for parsing this kind of input (perfect as in you don't need to write a more advanced parser). For this, just define all your symbols as little regexes:
lexeme_regexes = [r"\+", "-", r"\*", "/", "\d+"]
and then assemble a big regex that you can use for "walking" your input string:
regex = re.compile("|".join(lexeme_regexes))
lexemes = regex.findall("10 - 7 * 5 / 2 + 3")
To get to your normalized form, just assemble it again:
normalized = " ".join(lexemes)
This example doesn't ensure that all operators are seemlessly split by whitespace though, that'll need some more effort.
Upvotes: 1
Reputation:
I'd suggest taking a simple and easy approach; remove all spaces and then go through the string character by character, adding spaces before and after each operator symbol.
Anything with two operators in a row is going to be invalid syntax anyway, so you can leave that to your existing calculator code to throw errors on.
sanitised_string = ""
for char in unformatted_string_without_spaces:
if char in some_list_of_operators_you_made:
sanitised_string += " " + char + " "
else:
sanitised_string += char
Upvotes: 1