Reputation: 4474
For a code generation tool I'm working on, I need to take a string and generate a valid java variable name from it, but I'm not sure about the best way to do it.
For example:
"123 this is some message !"
=> _123_this_is_some_message
( or something similar)
Thanks
Upvotes: 7
Views: 5744
Reputation: 30848
You want to convert random strings into valid Java identifiers. According to the Java Language Specification, §3.8, the definition of an identifier is as follows:
Identifier:
IdentifierChars but not a Keyword or BooleanLiteral or NullLiteralIdentifierChars:
JavaLetter
IdentifierChars JavaLetterOrDigitJavaLetter:
any Unicode character that is a Java letterJavaLetterOrDigit:
any Unicode character that is a Java letter-or-digit
All you have to do, then, is step through your input and replace any invalid character with a valid one (e.g. underscore) or remove it altogether. Java even provides methods in the Character
class that tells you if a given character is a JavaLetter or a JavaLetterOrDigit: isJavaIdentifierStart()
and isJavaIdentifierPart
. (This is much easier than trying to exclude invalid characters because the set of valid chars is small and the set of invalid chars is huge.)
At the end, remember to make sure your result doesn't start with a digit a not left with a keyword or literal. If collisions are possible and undesired, you could append numbers to your results on an as-needed basis to obtain unique values.
Upvotes: 4
Reputation: 43683
You should:
\\s+
with _
\\W+
_
as prefix, if ^\d
match (or even if not)So something like
"_" + myString.replaceAll("\\s+", "_").replaceAll("\\W+", "")
Upvotes: 1
Reputation: 328737
Assuming you replace all invalid characters by _
something like the code below could work (rough example). You might want to add some logic for name collisions etc. It is based on the JLS #3.8:
An identifier is an unlimited-length sequence of Java letters and Java digits, the first of which must be a Java letter.
[...]
A "Java letter" is a character for which the method Character.isJavaIdentifierStart(int) returns true.
A "Java letter-or-digit" is a character for which the method Character.isJavaIdentifierPart(int) returns true.
public static void main(String[] args) {
String s = "123 sdkjh s;sdlkjh d";
StringBuilder sb = new StringBuilder();
if(!Character.isJavaIdentifierStart(s.charAt(0))) {
sb.append("_");
}
for (char c : s.toCharArray()) {
if(!Character.isJavaIdentifierPart(c)) {
sb.append("_");
} else {
sb.append(c);
}
}
System.out.println(sb);
}
Upvotes: 11