Reputation: 185
I am implementing a application that calculated the readability of Java files with the readability formula proposed by Posnett, Hindle and Devanbu (here).
The formula is: z = 8.87 - 0.033 * Volume + 0.40 * Lines - 1.5 * Entropy
They say that Entropy
is calculated from the counts of terms (tokens
or bytes) as well as the number of unique terms and bytes.
I made some research, but couldn't find a definition of a term in Java. The only thing that I found was this, that list some "useful Java terms", but I don't think that these are the only terms in Java.
So, what should I consider as Java terms? Can anyone give me an exaplanation?
Upvotes: 3
Views: 2066
Reputation: 6272
You're confusing different usages of the word "term". Two relevant definitions are:
qwerty
, then w
is a term because it's one of those characters. This is the definition used in the entropy calculation. Specifically, "term" can mean an individual character (byte) in the source code, or a "token" in Java, which means any part of the code that means one thing in the Java syntax (int foo = bar-3;
contains the tokens int
, foo
, =
, bar
, -
, 3
, and ;
).Note: When dealing with programming, a byte is sometimes synonymous with a character because characters are stored with one byte of memory.
Upvotes: 2
Reputation: 310979
It's not specific to Java. There is such a thing as a 'term' in Java, and you will find it in the JLS, but that's not what they're talking about. They are talking about tokens or bytes, in general terms, not language-specific. and in one place tokens and bytes, which appears to be a mistake.
The terms here can be bytes or tokens, and we use both in this paper. [emphasis added]
Upvotes: 1