Reputation: 2593
I am trying to search a for a sub string in a string, but figure there has to be a more efficient way then this..
//search for volume
if AnsiContainsStr(SearchString, 'v1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'V1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'Volume1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'Volume 1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'Vol1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'vol1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'Vol 1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'vol 1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'Vol.1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'vol.1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'Vol. 1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'vol. 1') then
Volume := '1';
if AnsiContainsStr(SearchString, 'v2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'V2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'Volume2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'Volume 2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'Vol2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'vol2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'Vol 2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'vol 2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'Vol.2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'vol.2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'Vol. 2') then
Volume := '2';
if AnsiContainsStr(SearchString, 'vol. 2') then
Volume := '2';
Upvotes: 2
Views: 2686
Reputation: 596156
Try something like this:
const
Prefixes: array[0..6] of String = (
'VOLUME '
'VOLUME'
'VOL. '
'VOL '
'VOL.'
'VOL'
'V'
);
var
S: String;
P: PChar;
I, J, Len: Integer;
Volume: Char;
begin
Volume = #0;
S := UpperCase(SearchString);
P := PChar(S);
Len := Length(S);
I := 1;
while (Len > 0) and (Volume = #0) do
begin
if (P^ <> 'V') then begin
Inc(P);
Dec(Len);
Continue;
end;
for J := Low(Prefixes) to High(Prefixes) do
begin
if AnsiStrLComp(P, PChar(Prefixes[J]), Length(Prefixes[J])) = 0 then
begin
Inc(P, Length(Prefixes[J]));
Dec(Len, Length(Prefixes[J]));
if (Len > 0) then begin
if (P^ >= '1') and (P^ <= '7') then
Volume := P^;
end;
Break;
end;
end;
end;
end;
Upvotes: 4
Reputation: 774
Since you tagged this with XE2, you can use regular expression to make this match easily
var
Regex: String;
begin
Regex := '^[v](ol\.?|olume)?\s*(1|\.\s*1)$';
if TRegEx.IsMatch(SearchString, Regex, [roIgnoreCase]) then
Volume := '1'
Regex := '^[v](ol\.?|olume)?\s*(2|\.\s*2)$';
if TRegEx.IsMatch(SearchString, Regex, [roIgnoreCase]) then
Volume := '2'
end;
Now, I'm not the best at devising a regular expression, but I tested the one above and it seems to match all your variations (maybe someone else can come up with one that is more succinct).
Upvotes: 11
Reputation: 16045
If you want it easy but slow - go RegExp way.
If you want it fast, then read answer by @LeleDumbo.
BUT! Before real search make a copy of string all uppercase - AnsiUpperCase function. Case-insensitive search slows down on every character. It would be better to make upcase copy of both string and search patterns. (Oh, @RobMcDonell already told you that :-) )
You are to convert prefixes into tree. Okay, in this simple example it would fit into a list (array): "V", "OL", "UME" in more complex case you could have search for V-OL-UME or V-ER-SION with same start and splitting tails)
Then read about http://en.wikipedia.org/wiki/Finite-state_machine - that is what u have to do.
A simple draft (not covering all possible use cases, for example "Vol . 2.2" ) would be:
Start in search-txt-1 state, #1 char to look. On each loop you have current state and current number of character to think of(thinking all to the left already scanned):
if state is search-txt-1, then search for txt-1 (namely "V") at current character and anywhere to the right ( System.StrUtils.PosEx function)
1.1. If not found - exit the loop, no text found
1.2. If found - inc(current-number), state := search-txt-2, next loop
if state is search-txt-2, then search for txt-2 ("UM") at current character only! (lazy: System.Copy(txt, current-char, system.length(txt-2)) = txt-2; fast: special comparison with length and offset from Jedi CodeLibrary)
2.1 if found, inc(current-number, length(txt-2), state := search-txt-3, next loop
2.2 if not found, do NOT change current-number, state := skip-dot, next loop
if state is search-txt-3, then search for txt-3 like above
3.1 if found, inc(current-number, length(txt-3), state := skip-dot, next loop
3.2 if not found, do NOT change current-number, state := skip-dot, next loop
if state is skip-dot, look if current-char is dot
4.1 if it is, inc (current-number), state := skip-few-blanks, next loop
4.2 if it is not do NOT change current-number, state := skip-few-blanks, next loop
if skip-few-blanks then look if current-char is " "
5.1 if it is, inc (current-number), state := skip-few-blanks, next loop (there may be more blanks)
5.2 if it is not do NOT change current-number, state := maybe-number, next loop
if maybe-number then System.Character.IsDigit(current-char) ???
6.1 if not - no number, search failed, next try - do NOT change current-number, state := search-txt-1, next loop
6.2 if is, remember where number started, state := reading-number, inc (current-number), next loop
if reading-number then System.Character.IsDigit(current-char) ???
7.1 if it is - one more digit - state := reading-number, inc (current-number), next loop
7.2 if it is not - number over - get slice of string from digit start to previous character (last digit), convert it (IntToStr(Copy(string, number-start, number-length)) and exit the loop ( you do not search several numbers in one string, do you? )
For more complex grammars there are tools like Yacc/Bison. But for such simple one you can maek your own custom FSM, it would be not hard but most fast way. Just be very attentive and not make errors in state transitions and current-char number shifts.
I hope i did not make but you have to test it.
Upvotes: 2
Reputation: 19346
Building on @user582118's answer:
If you use ^v(ol\.?|olume)?\s*([0-9]+)$
as the RegEx pattern, you don't have to try for each and every possible numerical value. It will match with 1 or more numeric characters at the end. You can then use TMatch
's Value
and Groups
properties to extract the number from the string.
var
RegEx: TRegEx; // This is a record, not a class, and doesn't need to be freed!
Match: TMatch;
i: Integer;
begin
RegEx := TRegEx.Create('^v(ol\.?|olume)?\s*([0-9]+)$');
Match := RegEx.Match('vol.3456');
WriteLn('Value: ' + Match.Value);
for i := 0 to Match.Groups.Count - 1 do
WriteLn('Group', i, ': ', Match.Groups[i].Value);
end;
Gives:
Value: vol.3456
Group0: vol.3456
Group1: ol.
Group2: 3456
Upvotes: 5
Reputation: 1319
Make your search string upper case first (once), and then do each check just against an upper case version of the search string. That reduces the number of checks by half without requiring case-insensitive searches (which may change case of both strings every time).
You could go a step further and use one of the wildcard match functions in the JCL such as StrMatches. However, while this would reduce the number of lines of code it could not be as fast as having the specific matches.
If you expect to make many different values for Volume, write your own function to search for the alphabetic part of the string, then do a separate check for what number comes after it.
Upvotes: 2
Reputation: 4166
I had to do something similar once for comparing mailing addresses. I stripped out white space and punctuation. Then I used CompareText so it was case insensitive.
A lot of your If statements deal with comparing strings that may or may not have a period or space between "Vol" or "Volume" and the number. Remove the period and whitespace and you are left with two If statements per volume number: one for VOL and one for VOLUME. You might even be able to whittle that down to one If statement per volume by replacing "volume" with "vol".
Upvotes: 3
Reputation: 9340
For a lot of strings and frequent search, using a suffix tree would be your best bet. Otherwise an easier way using regular expression could also help, your strings look regular enough.
Upvotes: 5