Reputation: 1016
I've been working on a function that is searching heirarchally organised text. Since the files I'm working with are large in both size and number, the matter of optimisation (both speed and memory) is becoming increasingly important and I'm looking on how to improve an algorithm
public NestLink<String> searchsubs(String input, NestLink<String> searchTags) {
// parameter tests
int startIndex = 0; // start of the resultant subsection
int endIndex = 0; // end of the resultant subsection
String openTag = searchTags.elem1; // start of the string I need
String closeTag = searchTags.elem2; // end of the string I need
String subString = null; // stores the result in between
NestLink<String> node = null; // temp variable
NestLink<String> out = null; // output variable
while (true) {
// find the opening string
startIndex = input.indexOf(openTag, endIndex);
if (startIndex == -1) {
break; // if no more are found, break from the loop
} else {
}
endIndex = input.indexOf(closeTag, startIndex);
if (endIndex == -1) {
break; // if tag isn't closed, break from the loop
} else {
}
// we now have a pair of tags with a content between
subString = input.substring(startIndex + openTag.length(), endIndex);
// store what we found, method unimportant
// search this content for each subsearch in the heirarchy
for (NestLink<String> subSearch : searchTags.subBranches) {
// recurse
node = subBlockParser(subString, subSearch);
// do stuff with results
}
}
//final do stuff
return out;
}
note: NestLink is a customised tree structure, the format is unimportant, however.
The result is that for each level of the search a copy of the substring, sometimes up to 1mbyte in size, is being created which is obviously far from efficient.
To try and resolve this I considered the following:
public NestLink<String> searchsubs(String input, int substringStart, int substringEnd,
NestLink<String> searchTags) {
// parameter tests
int startIndex = substringStart; // start of the resultant subsection
int endIndex = substringStart; // end of the resultant subsection
String openTag = searchTags.elem1; // start of the string I need
String closeTag = searchTags.elem2; // end of the string I need
String subString = null; // stores the result in between
NestLink<String> node = null; // temp variable
NestLink<String> out = null; // output variable
while (true) {
// find the opening string
startIndex = input.indexOf(openTag, endIndex);
if (startIndex == -1 || startIndex >= substringEnd) {
break; // if no more are found, break from the loop
} else {
}
endIndex = input.indexOf(closeTag, startIndex);
if (endIndex == -1 || endIndex >= substringEnd) {
break; // if tag isn't closed, break from the loop
} else {
}
// we now have a pair of tags with a content between
// store what we found, method unimportant
// search this content for each subsearch in the heirarchy
for (NestLink<String> subSearch : searchTags.subBranches) {
// recurse, this time sharing input, but with a new substring start and end to serve as bounds
node =
subBlockParser(input, startIndex + openTag.length(), endIndex, subSearch);
// do stuff with results
}
}
// final do stuff
return out;
}
This time rather than creating a substring I send the input and a set of bounds. This raises the question, how would the JRE handle this? Would it copy the input string (resulting in a reduced performance as an even bigger string is being copied now), or would it simply pass a pointer object as it would with othre objects and all recursions share the same string object (a significant performance increase as there is no copying)
Alternatively, are there any other concepts that may be applicable to a heirarchal search? and heirarchal result?
K.Barad
Upvotes: 0
Views: 132
Reputation: 346417
I'm afraid the Java standard API already implements this optimization, so there is nothing to be gained from it.
A java.lang.String
consists of an underlying char[]
, an offset and a length. substring()
reuses the char[]
and merely adjusts the offset and length.
I suggest using a profiler on your code to find out where it actually spends most of the time, before you think of any optimizations. That will prevent you from wasting your efforts like this, and I can almost guarantee that you will be surprised by the result.
Upvotes: 1
Reputation: 692053
substring
doesn't create a new String containing a copy of the chars in the original string. It just returns a String object sharing the same char array as the original string, but with a different offset and length.
So your second implementation is similar to the first one, but more complex (because it does what String does internally).
Upvotes: 4