Reputation: 131
I'm building a lexical analyzer with flex. I have to do something with functions that have more than four instructions. How can I count the number of instructions from a C source file ? I tried to count the number of semicolons (;) but how do I handle situations like this: if(strcmp(str1,str2)==2 && strlen(str1)>4) How many instructions do I have above? I think that there are six instructions: if, strcmp, strlen, &&, ==, > . Are there any patterns that define an instruction?
Upvotes: 1
Views: 197
Reputation: 5678
I couldn't resist this question because I am currently considering using Haskell as a kind of glorified perl
for analysing and bulk-editing my C project, and wondered how simple it would be to use Language-C for this. Of course there are many other good analysers around (as Jörg points out, a lexical analyser won't cut the mustard here!), in much more popular languages, but anyway, here goes:
module Main where
import System.Environment
import Language.C.Parser
import Language.C.Data.InputStream
import Language.C.Data.Position
import Language.C.Syntax.AST
import Language.C.Syntax.Utils
import Language.C.Analysis.DeclAnalysis
import Language.C.Data.Ident
main :: IO ()
main = do
[cFileName] <- getArgs
stream <- readInputStream cFileName
let startpos = initPos cFileName
case parseC stream startpos of
Left parseError -> error $ show parseError
Right translation -> mapM_ (putStrLn . show) $ mungeTrans translation
mungeTrans (CTranslUnit decls _) = mungeDecls decls
mungeDecls [] = []
mungeDecls ((CFDefExt funDef):decls) = mungeFunDef funDef : mungeDecls decls
mungeDecls (_:decls) = mungeDecls decls
mungeFunDef (CFunDef _ declarator _ cStatement _) = (nameOf declarator, numberOfStatements cStatement) where
nameOf (CDeclr (Just name) _ _ _ _) = identToString name
nameOf _ = "?"
numberOfStatements cstat = case getSubStmts cstat of
[] -> 1
block -> foldl1 (+) $ map numberOfStatements block
I hope non-haskellers can follow what is going on: mungeDecls
loops through the syntax tree, and considers only function declarations, which are analysed (by mungeFunDef
) into the function name and the number of statements.
The problem "what precisely is a C statement" is cowardly sidestepped by using the getSubStmts
utility function (source) which e.g. considers f(x) && g(x);
as one statement, not two.
using this on (preprocessed) main.c from the rlwrap project yields:
("main",12)
("fork_child",21)
("main_loop",306)
("init_rlwrap",61)
("check_optarg",2)
("current_option",4)
etc.
I hope this shameless plug of Haskell convinces some people to try it out for this kind of work!!
Upvotes: 0
Reputation: 43
Off the top of my head:
You'll need to watch for tokens previous to a parentheses which represent unresolved values (they are not any of the basic operators). These may be assumed to be instructions.
You likewise would want to put the comparison operators in your lexer as a form of instruction.
A corner case is if it a token begins with a " it may be assumed to be the start of a string, and ending with a " without having a preceding \ marking the end of a string. These should be combined into a single string token.
Use the rules in C about how variables and functions may be named to help you break unresolved values into sequences of tokens. (Example: Token 8*4*( breaks the rules on naming in C, so you know that you should resolve the unresolved value by breaking it apart using operators as the delimiter)
Upvotes: 0
Reputation: 369468
I don't think you can do that lexically, you will need to do at least some syntactic analysis, probably also semantic analysis as well.
Also, you need to define what an "instruction" is first, before you can even start to think about counting them. After all, the term "instruction" has no meaning in C, you will first need to give it one.
Upvotes: 1