Daniel Peñalba
Daniel Peñalba

Reputation: 31857

How to get the line count in a string using .NET (with any line break)

I need to count the number of lines in a string. Any line break can be character can be present in the string (CR, LF or CRLF).

So possible new line chars:
* \n
* \r
* \r\n

For example, with the following input:

This is [\n]
an string that [\r]
has four [\r\n]
lines

The method should return 4 lines. Do you know any built in function, or someone already implemented it?

static int GetLineCount(string input)
{
   // could you provide a good implementation for this method?
   // I want to avoid string.split since it performs really bad
}

NOTE: Performance is important for me, because I could read large strings.

Upvotes: 3

Views: 5605

Answers (7)

kain64b
kain64b

Reputation: 2326

Regex.Matches(input, "\n|\r|\r\n").Count + 1

Upvotes: 2

Jon Hanna
Jon Hanna

Reputation: 113272

int count = 0;
int len = input.Length;
for(int i = 0; i != len; ++i)
  switch(input[i])
  {
    case '\r':
      ++count;
      if (i + 1 != len && input[i + 1] == '\n')
        ++i;
      break;
    case '\n':
    // Uncomment below to include all other line break sequences
    // case '\u000A':
    // case '\v':
    // case '\f':
    // case '\u0085':
    // case '\u2028':
    // case '\u2029':
      ++count;
      break;
  }

Simply scan through, counting the line-breaks, and in the case of \r test if the next character is \n and skip it if it is.

Performance is important for me, because I could read large strings.

If at all possible then, avoid reading large strings at all. E.g. if they come from streams this is pretty easy to do directly on a stream as there is no more than one-character read-ahead ever needed.

Here's another variant that doesn't count newlines at the very end of a string:

int count = 1;
int len = input.Length - 1;
for(int i = 0; i < len; ++i)
  switch(input[i])
  {
    case '\r':
    if (input[i + 1] == '\n')
    {
      if (++i >= len)
      {
        break;
      }
    }
    goto case '\n';
        case '\n':
        // Uncomment below to include all other line break sequences
        // case '\u000A':
        // case '\v':
        // case '\f':
        // case '\u0085':
        // case '\u2028':
        // case '\u2029':
          ++count;
          break;      
  }

This therefore considers "", "a line", "a line\n" and "a line\r\n" to each be one line only, and so on.

Upvotes: 5

Devedse
Devedse

Reputation: 1861

Completely manual implementation: (You aren't going to be much faster then this)

public static int GetLineCount(string input)
{
    int lineCount = 0;

    for (int i = 0; i < input.Length; i++)
    {
        switch (input[i])
        {
            case '\r':
                {
                    if (i + 1 < input.Length)
                    {
                        i++;
                        if (input[i] == '\r')
                        {
                            lineCount += 2;
                        }
                        else
                        {
                            lineCount++;
                        }
                    }
                    else
                    {

                        lineCount++;
                    }
                }
                break;
            case '\n':
                lineCount++;
                break;
            default:
                break;
        }
    }

Upvotes: 1

w.b
w.b

Reputation: 11228

If you want to get the number of lines you should count only \n as \r means a carriage return and doesn't advance to the new line:

static int GetLineCount(string input)
{
    return input.Count(c => c == '\n');
}

Upvotes: -1

Kevin
Kevin

Reputation: 160

Here is an example similar to how Microsoft does it while reading lines from a file:

int numberOfLines = 0;

using (StreamReader sr = new StreamReader(path, encoding))
    while ((line = sr.ReadLine()) != null)
        numberOfLines += 1;

For reference/reading: http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,8d10107b7a92c5c2 http://referencesource.microsoft.com/#mscorlib/system/io/file.cs,675b2259e8706c26

Upvotes: 0

Marc Wittmann
Marc Wittmann

Reputation: 2362

What about this discussion

the simple

private static int Count4(string s)
{
    int n = 0;
    foreach( var c in s )
    {
        if ( c == '\n' ) n++;
    }
    return n+1;
}

should be very fast, even with larger strings... numerous other algorithms have been tested there. What speaks against this implementation? If you don`t extend to use parallel execution I would try this very simple approach.

Upvotes: 1

baddger964
baddger964

Reputation: 1227

Your string is from a file ?

I think this one do the job and do it pretty fast :

int count = File.ReadLines(path).Count();

from : How to get Number Of Lines without Reading File To End

Upvotes: 2

Related Questions