Gulzar
Gulzar

Reputation: 27966

Python parser of c++ simple expressions

NOTE: python 3.2

I want to make a python script that recieves c++ simple expressions as input, and outputs the very same expressions as tokens.

I vaguely remember my course in compilation, and I need something far less complex than a compiler.

Examples

int& name1=arr1[place1];
int *name2=    arr2[ place2];

should output

[    "int", "&", "name1", "=", "arr1", "[", "place1", "]"    ]
[    "int", "*", "name2", "=", "arr2", "[", "place2", "]"    ]

The spaces shouldn't matter, and I don't want them in the output.

This seems like a very simple task for someone who knows what they're doing, while I keep getting garbage white spaces or getting the division at wrong places.

I would greatly appreciate a quick solution for this - it really looks like a one-liner to me

Note that I only need expressions like I showed here. Nothing fancy.

Thanks

Upvotes: 0

Views: 116

Answers (4)

Bhavani A B
Bhavani A B

Reputation: 22

The first step is to replace the spaces with a blank. that is ' ' with a ''. Then use a split function. Make a list of special characters or words, and replace them with a special character and a delimiter. Split the line with the delimiter. Here is the example:

for line in sys.stdin:
    line = line.replace(' ', '')
    line = line.replace('&',',&,')
    a = line.split(',')

Upvotes: 1

Surreal Dreams
Surreal Dreams

Reputation: 26380

Looks to me like you need to define a list of "special/operator" characters. Replace any of those characters with itself plus a space of padding on either side. Use string.split() to turn the string into a list of "words". If you need a string representation, finish up with string.join(wordlist, "', '") and add a "[ '" to the front and "' ]" to the end.

I'm almost certainly missing a few things, like looking for semicolons to strip off, or to use in breaking apart concatenated expressions. You weren't specific about how many expressions you'd read in at once. If you read in many at a time, you could split on the semicolon character, then iterate over the resulting list of expressions.

Upvotes: 2

Padraic Cunningham
Padraic Cunningham

Reputation: 180441

Not overly familiar with c++ but you could maybe use re.findall with a list of special chars:

lines="""int& name1=arr1[place1];
int *name2=    arr2[ place2];"""
import re
for line in lines.splitlines():
    print(re.findall("[\*\$\[\]&=]|\w+",line))
['int', '&', 'name1', '=', 'arr1', '[', 'place1', ']']
['int', '*', 'name2', '=', 'arr2', '[', 'place2', ']']

Upvotes: 2

Michael S Priz
Michael S Priz

Reputation: 1126

Here is a generator that might do the trick:

def parseCPP(line):
   line=line.rstrip(";")
   word=""
   for i in line:
       if i.isalnum():
           word+=i
       else:
           if word:
               yield word
               word=""
           if i!=" ":
               yield i

Note this just picks up consecutive strings of alphanumeric characters. Any non-space characters are assumed to be operators/tokens by themselves.

Hope this helps :)

Upvotes: 0

Related Questions