Reputation: 51
I ran a static code analysis on a couple of projects and got the Cyclomatic Complexity for every file in those projects from the report that was generated. Now I want to calculate the average Cyclomatic Complexity for the whole project.
How would I best achieve that?
Just adding up the Cyclomatic Complexity values of each file and then dividing it by the number of files seems wrong to me since a short header file would have the same impact as a very long file. Also, I would like to avoid weighting the file's importance by lines of code.
Is there another way to do it? For example, with a median?
Upvotes: 2
Views: 1929
Reputation: 95334
Cyclomatic complexity in effect measures the number of decisions in your source code. (Its actually more complex than that in general, but decays to that in the case of structured code). It is often computed as #decisions+1, even in the more complex case (yes, that's an approximiation).
So, if you have two CC measures, x and y, with
CC(x)=#decisions(x)+1,
and
CC(y)=#decisions(y)+1,
the total
CC(x with y) = #decisions(x)+#decisions(y)+1=CC(x)+CC(y)-1
So if you have N sets of CC data, a good approximation of overall CC is:
[Sum i=1..n: CC(i)]-(N-1)
If you want an average per file across your system, divide the above by N.
Upvotes: 5
Reputation: 2135
As you said average metric is not very helpful as big number of simple functions may "hide" one very complex. So, I prefer to compare graphs of distribution. It is more informative.
Disclaimer: I am an author of Metrix++ which does this. Please, check how the distribution graph looks like: http://metrixplusplus.sourceforge.net/workflow.html#workflow_view_summary_section
Upvotes: 0
Reputation: 134
From your question, I would say you first need to define your intent regarding the average CC.
If you want to compute the average CC on the project's files, say to compare it to another project, then adding up CC from files and dividing by the number of code files is the right thing to do. But it gives you nothing better than an average: it is not representative of the intended characteristics at the individual file level. So when you say:
since a short header file would have the same impact as a very long file
this is wrong. The short header file and the very long files would not have the same CC, and you would not use the average CC to compare individual files.
If the average CC is used to compare projects among themselves: from a statistics perspective, software metrics have really skewed distributions so it may be better to use a median, indeed. But once again it heavily depends on what usage you have of it.
Upvotes: 0