Reputation: 93297
I have a set of strings (~80 000) I can only access sequentially by the hits.Doc(int).Get("fieldName") method.
List<string> idStrings = new List<string>();
int count = hits.Length();
for (int i = 0; i < count; i++)
{
string idString = hits.Doc(i).Get("id");
if (!idStrings.Contains(idString))
idStrings.Add(idString);
}
The strings will later on have to be int.TryParse()'d. I think there should be a faster way to do this. Any suggestions?
Upvotes: 5
Views: 4272
Reputation: 17152
First of all, use a Hashset<string>
instead of a list, the Contains method will go faster:
int count = hits.Length();
Hashset<string> idStrings = new Hashset<string>();
EDIT: You don't have to call "Contains" if you use a Hashset as it can't contain duplicate items. Just use Add, it will automatically remove duplicate values.
Upvotes: 13
Reputation: 700192
Use a Dictionary
instead of a List
. The Dictionary.ContainsKey
method is much faster than the List.Contains
method.
Dictionary<string, int> idStrings = new Dictionary<string, int>();
int count = hits.Length();
for (int i = 0; i < count; i++) {
string idString = hits.Doc(i).Get("id");
if (!idStrings.ContainsKey(idString)) {
idStrings.Add(idString, 1);
}
}
If you use framework 3.5 you can use a HashSet
instead of a Dictionary
:
HashSet<string> idStrings = new HashSet<string>();
int count = hits.Length();
for (int i = 0; i < count; i++) {
string idString = hits.Doc(i).Get("id");
idStrings.Add(idString);
}
Upvotes: 1
Reputation: 113300
The statement that slows your code down is idStrings.Contains(idString)
.
You can try using a faster data structure than List
(a tree or a hash table, maybe?).
Specifically, you need a data structure with sublinear search time. Hash tables have constant search time, while trees usually have logarithmic search time.
Upvotes: 0