Reputation: 27966
NOTE: python 3.2
I want to make a python script that recieves c++ simple expressions as input, and outputs the very same expressions as tokens.
I vaguely remember my course in compilation, and I need something far less complex than a compiler.
Examples
int& name1=arr1[place1];
int *name2= arr2[ place2];
should output
[ "int", "&", "name1", "=", "arr1", "[", "place1", "]" ]
[ "int", "*", "name2", "=", "arr2", "[", "place2", "]" ]
The spaces shouldn't matter, and I don't want them in the output.
This seems like a very simple task for someone who knows what they're doing, while I keep getting garbage white spaces or getting the division at wrong places.
I would greatly appreciate a quick solution for this - it really looks like a one-liner to me
Note that I only need expressions like I showed here. Nothing fancy.
Thanks
Upvotes: 0
Views: 116
Reputation: 22
The first step is to replace the spaces with a blank. that is ' ' with a ''. Then use a split function. Make a list of special characters or words, and replace them with a special character and a delimiter. Split the line with the delimiter. Here is the example:
for line in sys.stdin:
line = line.replace(' ', '')
line = line.replace('&',',&,')
a = line.split(',')
Upvotes: 1
Reputation: 26380
Looks to me like you need to define a list of "special/operator" characters. Replace any of those characters with itself plus a space of padding on either side. Use string.split() to turn the string into a list of "words". If you need a string representation, finish up with string.join(wordlist, "', '") and add a "[ '" to the front and "' ]" to the end.
I'm almost certainly missing a few things, like looking for semicolons to strip off, or to use in breaking apart concatenated expressions. You weren't specific about how many expressions you'd read in at once. If you read in many at a time, you could split on the semicolon character, then iterate over the resulting list of expressions.
Upvotes: 2
Reputation: 180441
Not overly familiar with c++ but you could maybe use re.findall with a list of special chars:
lines="""int& name1=arr1[place1];
int *name2= arr2[ place2];"""
import re
for line in lines.splitlines():
print(re.findall("[\*\$\[\]&=]|\w+",line))
['int', '&', 'name1', '=', 'arr1', '[', 'place1', ']']
['int', '*', 'name2', '=', 'arr2', '[', 'place2', ']']
Upvotes: 2
Reputation: 1126
Here is a generator that might do the trick:
def parseCPP(line):
line=line.rstrip(";")
word=""
for i in line:
if i.isalnum():
word+=i
else:
if word:
yield word
word=""
if i!=" ":
yield i
Note this just picks up consecutive strings of alphanumeric characters. Any non-space characters are assumed to be operators/tokens by themselves.
Hope this helps :)
Upvotes: 0