Reputation: 196469
I have very large string lists and arrays and i found 2 issues that i want to resolve:
These can be 2 different solution . . not sure if there is a faster way compared to a basic loop or this:
array = array.Where(r=>!String.IsNullOrEmpty(r.Trim());
Upvotes: 2
Views: 4946
Reputation: 26727
A classic for/foreach
is always faster then any Linq
to object expression as Linq
uses for/ForEach
behind the scene.
Upvotes: 5
Reputation: 26727
as expected the classic for
is faster then any LINQ solution
static void Main(string[] args)
{
List<string> list = new List<string>();
string longString = new string('x', 10000) + " ";
for (int i = 0; i < 1000000; i++)
{
list.Add(i % 100 == 0 ? "" : longString);
}
Stopwatch sw = Stopwatch.StartNew();
list.Where(r => !string.IsNullOrEmpty(r.Trim())).ToList();
sw.Stop();
Console.WriteLine("IsNullOrEmpty(Trim): {0}", sw.ElapsedMilliseconds);
GC.Collect();
sw = Stopwatch.StartNew();
list.Where(r => !string.IsNullOrWhiteSpace(r)).ToList();
sw.Stop();
Console.WriteLine("IsNullOrWhitespace: {0}", sw.ElapsedMilliseconds);
GC.Collect();
sw = Stopwatch.StartNew();
//this is result list
List<string> listResult = new List<string>();
int countList = list.Count;
for (int i = 0; i < countList; i++)
{
string item = list[i];
if (!string.IsNullOrWhiteSpace(item))
{
listResult.Add(item);
}
}
sw.Stop();
Console.WriteLine("Classic for + IsNullOrWhitespace: {0}", sw.ElapsedMilliseconds);
GC.Collect();
sw = Stopwatch.StartNew();
list.AsParallel().Where(r => !string.IsNullOrWhiteSpace(r)).ToList();
sw.Stop();
Console.WriteLine("PLINQ IsNullOrWhitespace: {0}", sw.ElapsedMilliseconds);
Console.Read();
}
on my machine (i5)
IsNullOrEmpty(Trim): 6165
IsNullOrWhitespace: 39
Classic for + IsNullOrWhitespace 21
PLINQ IsNullOrWhitespace: 65
Upvotes: 1
Reputation: 1500385
For a List<T>
there's a potentially-faster equivalent, which performs the removal in-place, RemoveAll
:
// We can do better than this - see below...
list.RemoveAll(r => String.IsNullOrEmpty(r.Trim()));
This may be faster in terms of how the repositioning within the list is performed.
It also depends on whether you want in-place removal, of course. Personally I'd normally prefer the LINQ approach as it's more flexible and more idiomatic these days - I usually treat collections as immutable sequences, even if they're not really :)
One point to note: you don't need to trim a string to find out whether or not it's got any whitespace. You can use string.IsNullOrWhiteSpace
, which should really be called IsNullOrEmptyOrWhitespace
- basically "does it not have content".
This could easily make a very significant difference to performance - if you have a lot of long strings, there's no point in having an O(N) operation (Trim
) just to determine that there's some content... and there's no point creating a new string when you're just going to throw it away again.
Note: sizes changed from earlier version to give good differentiation between final three cases...
Here's an example:
using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;
public class Test
{
static void Main()
{
List<string> list = new List<string>();
string longString = new string('x', 1000) + " ";
for (int i = 0; i < 1000000; i++)
{
list.Add(i % 100 == 0 ? "" : longString);
}
Stopwatch sw = Stopwatch.StartNew();
list.Where(r=> !string.IsNullOrEmpty(r.Trim())).ToList();
sw.Stop();
Console.WriteLine("IsNullOrEmpty(Trim): {0}", sw.ElapsedMilliseconds);
GC.Collect();
sw = Stopwatch.StartNew();
list.Where(r=> !string.IsNullOrWhiteSpace(r)).ToList();
sw.Stop();
Console.WriteLine("IsNullOrWhitespace: {0}", sw.ElapsedMilliseconds);
GC.Collect();
sw = Stopwatch.StartNew();
List<string> listResult = new List<string>();
int countList = list.Count;
for (int i = 0; i < countList; i++)
{
string item = list[i];
if (!string.IsNullOrWhiteSpace(item))
{
listResult.Add(item);
}
}
sw.Stop();
Console.WriteLine("New list: {0}", sw.ElapsedMilliseconds);
GC.Collect();
// This has to be last, as it modifies in-place
sw = Stopwatch.StartNew();
list.RemoveAll(r => string.IsNullOrWhiteSpace(r));
sw.Stop();
Console.WriteLine("List.RemoveAll: {0}", sw.ElapsedMilliseconds);
}
}
Sample results on my laptop:
IsNullOrEmpty(Trim): 3573
IsNullOrWhitespace: 452
New list: 232
List.RemoveAll: 153
Upvotes: 16
Reputation: 38179
You can use
array = array.Where(r=>!String.IsNullOrWhiteSpace(r));
If you have a List<string>
:
list.RemoveAll(str => string.IsNullOrWhiteSpace(str));
Upvotes: 4
Reputation: 51329
The only thing that comes to mind is to use the IsNullOrWhitespace method instead.
array = array.Where(r=>!String.IsNullOrWhitespace(r));
I doubt you'll notice any speed difference though. This is an extreme example of micro-optimization.
Upvotes: 5