AngryHacker
AngryHacker

Reputation: 61606

What is the fastest way to parse this string

I have a string, that is in the following format:

[Season] [Year] [Vendor] [Geography]

so an example might be: Spring 2009 Nielsen MSA

I need to be able to parse out Season and Year in the fastest way possible. I don't care about prettiness or cleverness. Just raw speed. The language is C# using VS2008, but the assembly is being built for .NET 2.0

Upvotes: 0

Views: 4233

Answers (7)

Spidey
Spidey

Reputation: 3023

Try this.

        string str = "Spring 2009 Nielsen MSA";
        string[] words = str.Split(' ');
        str = words[0] + " " + words[1];

Upvotes: 4

Jon Skeet
Jon Skeet

Reputation: 1500285

If you only need the season and year, then:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int secondSpace = text.IndexOf(' ', firstSpace + 1);
int year = int.Parse(text.Substring(firstSpace + 1, 
                                    secondSpace - firstSpace - 1));

If you can assume the year is always four digits, this is even faster:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = int.Parse(text.Substring(firstSpace + 1, 4));

If additionally you know that all years are in the 21st century, it can get stupidly optimal:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = 2000 + 10 * (text[firstSpace + 3] - '0') 
                + text[firstSpace + 4] - '0';

which becomes even less readable but possibly faster (depending on what the JIT does) as:

int firstSpace = text.IndexOf(' ');
string season = text.Substring(0, firstSpace);
int year = 1472 + 10 * text[firstSpace + 3] + text[firstSpace + 4];

Personally I think that's at least one step too far though :)

EDIT: Okay, taking this to extremes... you're only going to have a few seasons, right? Suppose they're "Spring", "Summer", "Fall", "Winter" then you can do:

string season;
int yearStart;
if (text[0] == 'S')
{
    season = text[1] == 'p' ? "Spring" : "Summer";
    yearStart = 7;
}
else if (text[0] == 'F')
{
    season = "Fall";
    yearStart = 5;
}
else
{
    season = "Winter";
    yearStart = 7;
}

int year = 1472 + 10 * text[yearStart + 2] + text[yearStart + 3];

This has the advantage that it will reuse the same string objects. Of course, it assumes that there's never anything wrong with the data...

Using Split as shown in Spidey's answer is certainly simpler than any of this, but I suspect it'll be slightly slower. To be honest, I'd at least try that first... have you measured the simplest code and found that it's too slow? The difference is likely to be very slight - certainly compared with whatever network or disk access you've got reading in the data in the first place.

Upvotes: 11

Antony Koch
Antony Koch

Reputation: 2053

string[] split = stringName.Split(' ');
split[0]+" "+split[1];

Upvotes: 1

vgru
vgru

Reputation: 51214

To add to the other answers, if you are expecting them to be in this format:

Spring xxxx
Summer xxxx
Autumn xxxx
Winter xxxx

then an even faster way would be:

string season = text.Substring(0, 6);
int year = int.Parse(text.Substring(7, 4);

That is rather nasty, though. :)

I wouldn't even consider coding like this.

Upvotes: 5

philsquared
philsquared

Reputation: 22493

I'd got with Spidey's suggestion, which should be decent enough performance, but with simple, easy to follow, easy to maintain code.

But if you really need to push the perf. envelope (and C# is the only tool available) then probably a couple of loops in series that search for the spaces, then pull the strings out using substr would marginally outdo it.

You could do the same with IndexOf instead of the loops, but rolling your own may be slightly faster (but you'd have to profile that).

Upvotes: 0

AndreyAkinshin
AndreyAkinshin

Reputation: 19011

Class Parser:

public class Parser : StringReader {

    public Parser(string s) : base(s) {
    }

    public string NextWord() {
        while ((Peek() >= 0) && (char.IsWhiteSpace((char) Peek())))
            Read();
        StringBuilder sb = new StringBuilder();
        do {
            int next = Read();
            if (next < 0)
                break;
            char nextChar = (char) next;
            if (char.IsWhiteSpace(nextChar))
                break;
            sb.Append(nextChar);
        } while (true);
        return sb.ToString();
    }
}

Use:

    string str = "Spring 2009 Nielsen MSA";
    Parser parser = new Parser(str);
    string season = parser.NextWord();
    string year = parser.NextWord();
    string vendor = parser.NextWord();
    string geography = parser.NextWord();

Upvotes: 0

Adam Robinson
Adam Robinson

Reputation: 185643

string input = "Spring 2009 Nielsen MSA";

int seasonIndex = input.IndexOf(' ') + 1;

string season = input.SubString(0, seasonIndex - 2);
string year = input.SubString(seasonIndex, input.IndexOf(' ', seasonIndex) - seasonIndex);

Upvotes: 1

Related Questions