João Alves
João Alves

Reputation: 185

What is a term in Java?

I am implementing a application that calculated the readability of Java files with the readability formula proposed by Posnett, Hindle and Devanbu (here).

The formula is: z = 8.87 - 0.033 * Volume + 0.40 * Lines - 1.5 * Entropy

They say that Entropy is calculated from the counts of terms (tokens or bytes) as well as the number of unique terms and bytes.

I made some research, but couldn't find a definition of a term in Java. The only thing that I found was this, that list some "useful Java terms", but I don't think that these are the only terms in Java.

So, what should I consider as Java terms? Can anyone give me an exaplanation?

Upvotes: 3

Views: 2066

Answers (2)

k_ssb
k_ssb

Reputation: 6272

You're confusing different usages of the word "term". Two relevant definitions are:

  • A word/phrase that has a special meaning in a particular context. A biology teacher might say "make sure to study the terms from Chapter 14 for the quiz tomorrow". This is the usage of "term" in your list of "useful Java terms".
  • One element in a sequence of things. For instance, if you have a sequence of characters qwerty, then w is a term because it's one of those characters. This is the definition used in the entropy calculation. Specifically, "term" can mean an individual character (byte) in the source code, or a "token" in Java, which means any part of the code that means one thing in the Java syntax (int foo = bar-3; contains the tokens int, foo, =, bar, -, 3, and ;).

Note: When dealing with programming, a byte is sometimes synonymous with a character because characters are stored with one byte of memory.

Upvotes: 2

user207421
user207421

Reputation: 310979

It's not specific to Java. There is such a thing as a 'term' in Java, and you will find it in the JLS, but that's not what they're talking about. They are talking about tokens or bytes, in general terms, not language-specific. and in one place tokens and bytes, which appears to be a mistake.

The terms here can be bytes or tokens, and we use both in this paper. [emphasis added]

Upvotes: 1

Related Questions