Average Cyclomatic Complexity of multiple files

I ran a static code analysis on a couple of projects and got the Cyclomatic Complexity for every file in those projects from the report that was generated. Now I want to calculate the average Cyclomatic Complexity for the whole project.

How would I best achieve that?

Just adding up the Cyclomatic Complexity values of each file and then dividing it by the number of files seems wrong to me since a short header file would have the same impact as a very long file. Also, I would like to avoid weighting the file's importance by lines of code.

Is there another way to do it? For example, with a median?

Upvotes: 2

Answers (3)

Ira Baxter

Reputation: 95334

Cyclomatic complexity in effect measures the number of decisions in your source code. (Its actually more complex than that in general, but decays to that in the case of structured code). It is often computed as #decisions+1, even in the more complex case (yes, that's an approximiation).

So, if you have two CC measures, x and y, with

   CC(x)=#decisions(x)+1,

and

   CC(y)=#decisions(y)+1,

the total

   CC(x with y) = #decisions(x)+#decisions(y)+1=CC(x)+CC(y)-1

So if you have N sets of CC data, a good approximation of overall CC is:

   [Sum i=1..n: CC(i)]-(N-1)

If you want an average per file across your system, divide the above by N.

Upvotes: 5

Andrew

Reputation: 2135

As you said average metric is not very helpful as big number of simple functions may "hide" one very complex. So, I prefer to compare graphs of distribution. It is more informative.

Disclaimer: I am an author of Metrix++ which does this. Please, check how the distribution graph looks like: http://metrixplusplus.sourceforge.net/workflow.html#workflow_view_summary_section

Upvotes: 0

Boris Baldassari

Reputation: 134

From your question, I would say you first need to define your intent regarding the average CC.

If you want to compute the average CC on the project's files, say to compare it to another project, then adding up CC from files and dividing by the number of code files is the right thing to do. But it gives you nothing better than an average: it is not representative of the intended characteristics at the individual file level. So when you say:

since a short header file would have the same impact as a very long file

this is wrong. The short header file and the very long files would not have the same CC, and you would not use the average CC to compare individual files.

If the average CC is used to compare projects among themselves: from a statistics perspective, software metrics have really skewed distributions so it may be better to use a median, indeed. But once again it heavily depends on what usage you have of it.

Upvotes: 0

Average Cyclomatic Complexity of multiple files

Answers (3)

Related Questions