leora
leora

Reputation: 196469

In C#, what is the fastest way to remove blank entries in a string[] or List<string>?

I have very large string lists and arrays and i found 2 issues that i want to resolve:

  1. Remove all entries that are blank strings
  2. Remove all entries that are just whitespace

These can be 2 different solution . . not sure if there is a faster way compared to a basic loop or this:

array = array.Where(r=>!String.IsNullOrEmpty(r.Trim());

Upvotes: 2

Views: 4946

Answers (5)

Massimiliano Peluso
Massimiliano Peluso

Reputation: 26727

A classic for/foreach is always faster then any Linq to object expression as Linq uses for/ForEach behind the scene.

Upvotes: 5

Massimiliano Peluso
Massimiliano Peluso

Reputation: 26727

as expected the classic for is faster then any LINQ solution

static void Main(string[] args)
    {


        List<string> list = new List<string>();
        string longString = new string('x', 10000) + " ";
        for (int i = 0; i < 1000000; i++)
        {
            list.Add(i % 100 == 0 ? "" : longString);
        }

        Stopwatch sw = Stopwatch.StartNew();
        list.Where(r => !string.IsNullOrEmpty(r.Trim())).ToList();
        sw.Stop();
        Console.WriteLine("IsNullOrEmpty(Trim): {0}", sw.ElapsedMilliseconds);

        GC.Collect();

        sw = Stopwatch.StartNew();
        list.Where(r => !string.IsNullOrWhiteSpace(r)).ToList();
        sw.Stop();
        Console.WriteLine("IsNullOrWhitespace: {0}", sw.ElapsedMilliseconds);


        GC.Collect();


        sw = Stopwatch.StartNew();
        //this is result list
        List<string> listResult = new List<string>();
        int countList = list.Count;
        for (int i = 0; i < countList; i++)
        {
            string item = list[i];
            if (!string.IsNullOrWhiteSpace(item))
            {
                listResult.Add(item);
            }
        }
        sw.Stop();
        Console.WriteLine("Classic for + IsNullOrWhitespace: {0}", sw.ElapsedMilliseconds);

        GC.Collect();

        sw = Stopwatch.StartNew();
        list.AsParallel().Where(r => !string.IsNullOrWhiteSpace(r)).ToList();
        sw.Stop();
        Console.WriteLine("PLINQ IsNullOrWhitespace: {0}", sw.ElapsedMilliseconds);

        Console.Read();
    }

on my machine (i5)

IsNullOrEmpty(Trim): 6165
IsNullOrWhitespace: 39
Classic for + IsNullOrWhitespace 21
PLINQ IsNullOrWhitespace: 65

Upvotes: 1

Jon Skeet
Jon Skeet

Reputation: 1500385

For a List<T> there's a potentially-faster equivalent, which performs the removal in-place, RemoveAll:

// We can do better than this - see below...
list.RemoveAll(r => String.IsNullOrEmpty(r.Trim()));

This may be faster in terms of how the repositioning within the list is performed.

It also depends on whether you want in-place removal, of course. Personally I'd normally prefer the LINQ approach as it's more flexible and more idiomatic these days - I usually treat collections as immutable sequences, even if they're not really :)

One point to note: you don't need to trim a string to find out whether or not it's got any whitespace. You can use string.IsNullOrWhiteSpace, which should really be called IsNullOrEmptyOrWhitespace - basically "does it not have content".

This could easily make a very significant difference to performance - if you have a lot of long strings, there's no point in having an O(N) operation (Trim) just to determine that there's some content... and there's no point creating a new string when you're just going to throw it away again.

Note: sizes changed from earlier version to give good differentiation between final three cases...

Here's an example:

using System;
using System.Collections.Generic;
using System.Diagnostics;
using System.Linq;

public class Test
{
    static void Main()
    {
        List<string> list = new List<string>();
        string longString = new string('x', 1000) + " ";
        for (int i = 0; i < 1000000; i++)
        {
            list.Add(i % 100 == 0 ? "" : longString);
        }

        Stopwatch sw = Stopwatch.StartNew();
        list.Where(r=> !string.IsNullOrEmpty(r.Trim())).ToList();
        sw.Stop();
        Console.WriteLine("IsNullOrEmpty(Trim): {0}", sw.ElapsedMilliseconds);

        GC.Collect();

        sw = Stopwatch.StartNew();
        list.Where(r=> !string.IsNullOrWhiteSpace(r)).ToList();
        sw.Stop();
        Console.WriteLine("IsNullOrWhitespace: {0}", sw.ElapsedMilliseconds);

        GC.Collect();

        sw = Stopwatch.StartNew();
        List<string> listResult = new List<string>();
        int countList = list.Count;
        for (int i = 0; i < countList; i++)
        {
            string item = list[i];
            if (!string.IsNullOrWhiteSpace(item))
            {
                listResult.Add(item);
            }
        }
        sw.Stop();
        Console.WriteLine("New list: {0}", sw.ElapsedMilliseconds);

        GC.Collect();

        // This has to be last, as it modifies in-place 
        sw = Stopwatch.StartNew();
        list.RemoveAll(r => string.IsNullOrWhiteSpace(r));
        sw.Stop();
        Console.WriteLine("List.RemoveAll: {0}", sw.ElapsedMilliseconds);
    }        
}

Sample results on my laptop:

IsNullOrEmpty(Trim): 3573
IsNullOrWhitespace: 452
New list: 232
List.RemoveAll: 153

Upvotes: 16

vc 74
vc 74

Reputation: 38179

You can use

array = array.Where(r=>!String.IsNullOrWhiteSpace(r));

If you have a List<string>:

list.RemoveAll(str => string.IsNullOrWhiteSpace(str));

Upvotes: 4

Chris Shain
Chris Shain

Reputation: 51329

The only thing that comes to mind is to use the IsNullOrWhitespace method instead.

array = array.Where(r=>!String.IsNullOrWhitespace(r));

I doubt you'll notice any speed difference though. This is an extreme example of micro-optimization.

Upvotes: 5

Related Questions