deserializing data best practices

Question

I have been given the task of deserializing some data. The data has all been munged into a string which is in the following format:

InternalNameA8ValueDisplay NameA¬InternalNameB8ValueDisplay NameB¬ etc etc.

(ie, it has an internal name, '8', the value, the display name, followed by '¬' **). for example, you'd have FirstName8JoeFirst Name¬

I have no control over how this data is serialized, its legacy stuff.

I've thought of doing a bunch of splits on the string, or breaking it up into a char array and splitting down the text that way. But this just seems horrible. This way there is too much that could go wrong (e.g, if the value of a phone number (for example), could begin with '8'.

What I want to know is what peoples' approaches to this would be? Is there anything more clever i can do to break the data down

note: '¬' isn't actually the character, it looks more like an arrow pointing left. but I'm away from my machine at the moment. Doh!

Thanks.

Erick T · Accepted Answer

Instead of using splits, I would recommend using a simple state machine. Walk over each characters until you hit a delimiter, then you know you're on the next field. That takes care of issues like an "8" in a phone number.

NOTE - untested code ahead.

var fieldValues = new string[3];
var currentField = 0;
var line = "InternalNameA8ValueDisplay NameA¬InternalNameB8ValueDisplay NameB¬";

foreach (var c in line)
{
    if (c == '8' && currentField == 0)
    {
        currentField++; continue;
    }

    if (c == '¬')
    {
        currentField++; continue;
    }

    fieldValues[currentField] += c;
}

Dealing with wonky formats - always a good time!

Good luck, Erick

deserializing data best practices

Answers (1)

Related Questions