Reputation: 18863
I am parsing a PDF and getting a lot of Strings with \t, \r, \n,\s
... And they appear on both ends of the String and don't appear in order. So I can have
ex:
"\t\s\t\n
Some important data I need surrounded by useless data \r\t\s\s\r\t\t
"
. Is there any efficient ways to trim these Strings?
What I have so far which isn't good enough because I want some symbols.:
public static String trimToLetters(String sourceString) {
int beginIndex = 0;
int endIndex = sourceString.length() - 1;
Pattern p = Pattern.compile("[A-Z_a-z\\;\\.\\(\\)\\*\\?\\:\\\"\\']");
Matcher matcher = p.matcher(sourceString);
if (matcher.find()) {
if (matcher.start() >= 0) {
beginIndex = matcher.start();
StringBuilder sb = new StringBuilder(sourceString);
String sourceReverse = sb.reverse().toString();
matcher = p.matcher(sourceReverse);
if (matcher.find()) {
endIndex = sourceString.length() - matcher.start();
}
}
}
return sourceString.substring(beginIndex, endIndex);
}
Upvotes: 1
Views: 409
Reputation: 726579
The trim
method of the String
should be able to remove all whitespace from both ends of the string:
trim
: Returns a copy of the string, with leading and trailing whitespace omitted.
P.S. \s
is not a valid escape sequence in Java.
Upvotes: 6