Reputation: 212
How can I parse a syntactically correct C file containing a single function but with non-defined types? The file is automatically indented (4 spaces) using this service with brackets below each block keyword, i.e. something like
if ( condition1 )
{
func1( int hi );
unktype foo;
do
{
if ( condition2 )
goto LABEL_1;
}
while ( condition3 );
}
else
{
float a = bar(baz, 0);
LABEL_1:
int foobar = (int)a;
}
The first line is the prototype, the second is a "{". All the lines end with \n. The last line is simply "}\n" There are lots of many-to-one gotos, and the labels are often out of their block (awful, I know :D ) I only care about structural information, i.e. blocks and statement types. Here what I'd like to get (when printed, indent added for clarity):
[If(condition = [condition1],
bodytrue = ["func1( int hi );",
"unktype foo;"
DoWhile(condition = [condition3],
body = [
SingleLineIf(condition = [condition2],
bodytrue =["goto LABEL_1;"],
bodyelse = []
)
]
)
]
bodyelse = ["float a = bar(baz, 0);",
"int foobar = (int)a;"
]
)]
with condition1, condition2 and condition 3 strings. Other constructs would work the same.
The labels can be discarded. I also need to include blocks not associated with any special statement, like Block([...]).
Standard C language Python parsers dond't work (for instance pycparser gives syntax error) because of the unknown types
Upvotes: 0
Views: 259
Reputation: 63709
Pyparsing includes a simple C parser as part of its examples, here is a parser that will process your sample code, and a little bit more (includes support for for
statements).
This is not a very good C parser. It brushes broadly across if, while, and do conditions as just strings in nested parentheses. But it may give you a start on extracting what bits you are interested in.
import pyparsing as pp
IF, WHILE, DO, ELSE, FOR = map(pp.Keyword, "if while do else for".split())
SEMI, COLON, LBRACE, RBRACE = map(pp.Suppress, ';:{}')
stmt_body = pp.Forward()
single_stmt = pp.Forward()
stmt_block = stmt_body | single_stmt
if_condition = pp.ungroup(pp.nestedExpr('(', ')'))
while_condition = if_condition()
for_condition = if_condition()
if_stmt = pp.Group(IF
+ if_condition("condition")
+ stmt_block("bodyTrue")
+ pp.Optional(ELSE + stmt_block("bodyElse"))
)
do_stmt = pp.Group(DO
+ stmt_block("body")
+ WHILE
+ while_condition("condition")
+ SEMI
)
while_stmt = pp.Group(WHILE + while_condition("condition")
+ stmt_block("body"))
for_stmt = pp.Group(FOR + for_condition("condition")
+ stmt_block("body"))
other_stmt = (~(LBRACE | RBRACE) + pp.SkipTo(SEMI) + SEMI)
single_stmt <<= if_stmt | do_stmt | while_stmt | for_stmt | other_stmt
stmt_body <<= pp.nestedExpr('{', '}', content=single_stmt)
label = pp.pyparsing_common.identifier + COLON
parser = pp.OneOrMore(stmt_block)
parser.ignore(label)
sample = """
if ( condition1 )
{
func1( int hi );
unktype foo;
do
{
if ( condition2 )
goto LABEL_1;
}
while ( condition3 );
}
else
{
float a = bar(baz, 0);
LABEL_1:
int foobar = (int)a;
}
"""
print(parser.parseString(sample).dump())
prints:
[['if', 'condition1', ['func1( int hi )', 'unktype foo', ['do', [['if', 'condition2', 'goto LABEL_1']], 'while', 'condition3']], 'else', ['float a = bar(baz, 0)', 'int foobar = (int)a']]]
[0]:
['if', 'condition1', ['func1( int hi )', 'unktype foo', ['do', [['if', 'condition2', 'goto LABEL_1']], 'while', 'condition3']], 'else', ['float a = bar(baz, 0)', 'int foobar = (int)a']]
- bodyElse: ['float a = bar(baz, 0)', 'int foobar = (int)a']
- bodyTrue: ['func1( int hi )', 'unktype foo', ['do', [['if', 'condition2', 'goto LABEL_1']], 'while', 'condition3']]
[0]:
func1( int hi )
[1]:
unktype foo
[2]:
['do', [['if', 'condition2', 'goto LABEL_1']], 'while', 'condition3']
- body: [['if', 'condition2', 'goto LABEL_1']]
[0]:
['if', 'condition2', 'goto LABEL_1']
- bodyTrue: 'goto LABEL_1'
- condition: 'condition2'
- condition: 'condition3'
- condition: 'condition1'
Upvotes: 1