Zsolt
Zsolt

Reputation: 345

Fastest way to determine if a string contains a character

I have a string which consists of unicode characters. The same character can occur only once. The length of the string is between 1 and ~50.

What is the fastest way to check if a particular character is in the string or not?

Iterating the string is not a good choice, isn't it? Is there any efficient algorithm for this purpose?

My first idea was to keep the characters in the string alphabetically sorted. It could be searched quickly, but the sorting and the comparison of unicode characters are not so trivial (using the right collation) and it has a big cost, probably bigger then iterating the whole string.

Maybe some hashing? Maybe the iteration is the fastest way?

Any idea?

Upvotes: 0

Views: 2098

Answers (3)

jayadev
jayadev

Reputation: 3711

Is it a repeated operation on the same string or 1 time task ? If it is a 1 time task, then you can't do better than going through the string after all you have to look at all characters. O(n)

If it is repeated operation then you can do some preprocessing of the strings to make the subsequent operations faster. The most space efficient and fastest would be to build bloom filters for the characters in each string. Once built which is is fast too, you can say if a character is not present in 0(1) and only do a binary search of the sorted string only if bloom filter says yes.

Upvotes: 0

Mark Ransom
Mark Ransom

Reputation: 308344

A linear search through the string is O(n) with each operation being very simple. Sorting the string is O(n log n) with more complicated operations. It's pretty clear that the linear search will be faster in all cases.

If the characters are stored in UTF-8 or UTF-16 encoding then there's a possibility that you'll need to search for more than one contiguous element. There are ways to speed that up, such as Boyer-Moore or Knuth-Morris-Pratt. It's unclear whether there would be an actual speedup with such short search strings.

Upvotes: 0

Karoly Horvath
Karoly Horvath

Reputation: 96266

If there's no preprocessing, the simplest and fastest way is to iterate through the characters.

If there's preprocessing, the previous approach might still the best, or you could try a small hashtable which stores whether a string contains that character. Storing the hash will take extra space, but could be better for memory cache (with low hash collision & assuming you don't have to access the actual string). Make sure you measure the peformance.

I have a feeling you're trying to over-engineer a really simple task. Have you verified that this is a bottleneck in your application?

Upvotes: 4

Related Questions