buzzzzjay
buzzzzjay

Reputation: 1150

String Builder vs Lists

I am reading in multiple files in with millions of lines and I am creating a list of all line numbers that have a specific issue. For example if a specific field is left blank or contains an invalid value.

So my question is what would be the most efficient date type to keep track of a list of numbers that could be upwards of a million number of rows. Would using String Builder, Lists, or something else be more efficient?

My end goal is to out put a message like "Specific field is blank on 1-32, 40, 45, 47, 49-51, etc. So in the case of a String Builder, I would check the previous value and if it is is only 1 more I would change it from 1 to 1-2 and if it was more than one would separate it by a comma. With the List, I would just add each number to the list and then combine then once the file has been completely read. However in this case I could have multiple list containing millions of numbers.

Here is the current code I am using to combine a list of numbers using String Builder:

string currentLine = sbCurrentLineNumbers.ToString();
string currentLineSub;

StringBuilder subCurrentLine = new StringBuilder();
StringBuilder subCurrentLineSub = new StringBuilder();

int indexLastSpace = currentLine.LastIndexOf(' ');
int indexLastDash = currentLine.LastIndexOf('-');

int currentStringInt = 0;

if (sbCurrentLineNumbers.Length == 0)
{
    sbCurrentLineNumbers.Append(lineCount);
}
else if (indexLastSpace == -1 && indexLastDash == -1)
{
    currentStringInt = Convert.ToInt32(currentLine);

    if (currentStringInt == lineCount - 1)
        sbCurrentLineNumbers.Append("-" + lineCount);
    else
    {
        sbCurrentLineNumbers.Append(", " + lineCount);
        commaCounter++;
    }
}
else if (indexLastSpace > indexLastDash)
{
    currentLineSub = currentLine.Substring(indexLastSpace);
    currentStringInt = Convert.ToInt32(currentLineSub);

    if (currentStringInt == lineCount - 1)
        sbCurrentLineNumbers.Append("-" + lineCount);
    else
    {
        sbCurrentLineNumbers.Append(", " + lineCount);
        commaCounter++;
    }
}
else if (indexLastSpace < indexLastDash)
{
    currentLineSub = currentLine.Substring(indexLastDash + 1);
    currentStringInt = Convert.ToInt32(currentLineSub);

    string charOld = currentLineSub;
    string charNew = lineCount.ToString();

    if (currentStringInt == lineCount - 1)
        sbCurrentLineNumbers.Replace(charOld, charNew);
    else
    {
        sbCurrentLineNumbers.Append(", " + lineCount);
        commaCounter++;
    }
}   

Upvotes: 4

Views: 11059

Answers (5)

Sandeep
Sandeep

Reputation: 7334

As others have pointed out, I would probably use StringBuilder. The List may have to resize many times; the new implementation of StringBuilder does not have to resize.

Upvotes: 3

Eric Nicholson
Eric Nicholson

Reputation: 4123

Is your output supposed to be human readable? If so, you'll hit the limit of what is reasonable to read, long before you have any performance/memory issues from your data structure. Use whatever is easiest for you to work with.

If the output is supposed to be machine readable, then that output might suggest an appropriate data structure.

Upvotes: 3

Tony Hopkinson
Tony Hopkinson

Reputation: 20320

Depends on how you can / want to break the code up.

Given you are reading it in line order, not sure you need a list at all. Your current desired output implies that you can't output anything until the file is completely scanned. The size of the file suggests a one pass`analysis phase would be a good idea as well, given you are going to use buffered input as opposed to reading the entire thing into memory.

I'd be tempted with an enum to describe the issue e.g Field??? is blank and then use that as the key a dictionary of string builders.

As a first thought anyway

Upvotes: 3

Casperah
Casperah

Reputation: 4554

StringBuilder serves your purpose so stick with that, if you ever need the line numbers you can easily change the code then.

Upvotes: 3

Oded
Oded

Reputation: 498972

My end goal is to out put a message like "Specific field is blank on 1-32, 40, 45, 47, 49-51

If that's the end goal, no point in going through an intermediary representation such as a List<int> - just go with a StringBuilder. You will save on memory and CPU that way.

Upvotes: 7

Related Questions