Adam Reed
Adam Reed

Reputation: 229

Custom File Parser

I am building a parser for a custom pipe delimited file format and I am finding my code to be very bulky, could someone suggest better methods of parsing this data?

The file's data is broken down by a line delimited by a pipe (|), each line starts with a record type, followed by an ID, followed by different number of columns after.

Ex: CDI|11111|OTHERDATA|somemore|other

CEX001|123131|DATA|data

CCC|123131|DATA|data1|data2|data3|data4|data5|data6

. I am splitting by pipe, then grabbing the first two columns, and then using a switch checking the first line and calling a function that will parse the remaining into an object purpose built for that record type. I would really like a more elegant method.

    public Dictionary<string, DataRecord> Parse()
    { 
        var data = new Dictionary<string, DataRecord>();

        var rawDataDict = new Dictionary<string, List<List<string>>>();
        foreach (var line in File.ReadLines(_path))
        {
            var split = line.Split('|');
            var Id = split[1];
            if (!rawDataDict.ContainsKey(Id))
            {
                rawDataDict.Add(Id, new List<List<string>> {split.ToList()});
            }
            else
            {
                rawDataDict[Id].Add(split.ToList());
            }
        }

        rawDataDict.ToList().ForEach(pair =>
        {
            var key = pair.Key.ToString();
            var values = pair.Value;

            foreach (var value in values)
            {

                var recordType = value[0];

                switch (recordType)
                {
                    case "CDI":
                        var cdiRecord = ParseCdi(value);
                        if (!data.ContainsKey(key))
                        {
                            data.Add(key, new DataRecord
                            {
                                Id = key, CdiRecords = new List<CdiRecord>() {  cdiRecord }
                            });
                        }
                        else
                        {
                            data[key].CdiRecords.Add(cdiRecord);
                        }
                        break;
                    case "CEX015":
                        var cexRecord = ParseCex(value);
                        if (!data.ContainsKey(key))
                        {
                            data.Add(key, new DataRecord
                            {
                                Id = key,
                                CexRecords = new List<Cex015Record>() { cexRecord }
                            });
                        }
                        else
                        {
                            data[key].CexRecords.Add(cexRecord);
                        }
                        break;
                    case "CPH":
                        CphRecord cphRecord = ParseCph(value);
                        if (!data.ContainsKey(key))
                        {
                            data.Add(key, new DataRecord
                            {
                                Id = key,
                                CphRecords = new List<CphRecord>() { cphRecord }
                            });
                        }
                        else
                        {
                            data[key].CphRecords.Add(cphRecord);
                        }
                        break;
                }
            }
        });

        return data;
    }

Upvotes: 0

Views: 1477

Answers (2)

Kevin Smith
Kevin Smith

Reputation: 14436

Try out FileHelper, here is your exact example - http://www.filehelpers.net/example/QuickStart/ReadFileDelimited/

Given you're data of

CDI|11111|OTHERDATA|Datas
CEX001|123131|DATA
CCC|123131

You could create a class to model this to allow FileHelpers to parse the delimited file:

[DelimitedRecord("|")]
public class Record
{
    public string Type { get; set; }

    public string[] Fields { get; set; }
}

Then we could allow FileHelpers to parse in to this object type:

var engine = new FileHelperEngine<Record>();
var records = engine.ReadFile("Input.txt");

After we've got all the records loaded in to Record objects we can use a bit of linq to pull them in to their given types

var cdis = records.Where(x => x.Type == "CDI")
                .Select(x => new Cdi(x.Fields[0], x.Fields[1], x.Fields[2])
                .ToArray();

var cexs = records.Where(x => x.Type == "CEX001")
                .Select(x => new Cex(x.Fields[0], x.Fields[1)
                .ToArray();

var cccs = records.Where(x => x.Type == "CCC")
                .Select(x => new Ccc(x.Fields[0])
                .ToArray();

You could also simplify the above using something like AutoMapper - http://automapper.org/

Alternatively you could use ConditionalRecord attributes which will only parse certain lines if they match a given criteria. This will however be slower the more record types you have but you're code will be cleaner and FileHelpers will be doing most of the heavy lifting:

[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CDI")]
public class Cdi
{
    public string Type { get; set; }

    public int Number { get; set; }

    public string Data1 { get; set; }

    public string Data2 { get; set; }

    public string Data3 { get; set; }
}

[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CEX001")]
public class Cex001
{
    public string Type { get; set; }

    public int Number { get; set; }

    public string Data1 { get; set; }
}

[DelimitedRecord("|")]
[ConditionalRecord(RecordCondition.IncludeIfMatchRegex, "^CCC")]
public class Ccc
{
    public string Type { get; set; }

    public int Number { get; set; }
}


            var input =
            @"CDI|11111|Data1|Data2|Data3
CEX001|123131|Data1
CCC|123131";

var CdiEngine = new FileHelperEngine<Cdi>();
var cdis = CdiEngine.ReadString(input);


var cexEngine = new FileHelperEngine<Cex001>();
var cexs = cexEngine.ReadString(input);

var cccEngine = new FileHelperEngine<Ccc>();
var cccs = cccEngine.ReadString(input);

Upvotes: 1

tinstaafl
tinstaafl

Reputation: 6948

Your first loop isn't really doing anything other than organizing your data differently. You should be able to eliminate it and use the data as it is from the file. Something like this should give you what you want:

foreach (var line in File.ReadLines(_path))
{
    var split = line.Split('|');
    var key = split[1];
    var value = split;


        var recordType = value[0];

        switch (recordType)
        {
            case "CDI":
                var cdiRecord = ParseCdi(value.ToList());
                if (!data.ContainsKey(key))
                {
                    data.Add(key, new DataRecord
                    {
                        Id = key, CdiRecords = new List<CdiRecord>() {  cdiRecord }
                    });
                }
                else
                {
                    data[key].CdiRecords.Add(cdiRecord);
                }
                break;
            case "CEX015":
                var cexRecord = ParseCex(value.ToList());
                if (!data.ContainsKey(key))
                {
                    data.Add(key, new DataRecord
                    {
                        Id = key,
                        CexRecords = new List<Cex015Record>() { cexRecord }
                    });
                }
                else
                {
                    data[key].CexRecords.Add(cexRecord);
                }
                break;
            case "CPH":
                CphRecord cphRecord = ParseCph(value.ToList());
                if (!data.ContainsKey(key))
                {
                    data.Add(key, new DataRecord
                    {
                        Id = key,
                        CphRecords = new List<CphRecord>() { cphRecord }
                    });
                }
                else
                {
                    data[key].CphRecords.Add(cphRecord);
                }
                break;
        }
};

Caveat: This is just put together here and hasn't been properly checked for syntax.

Upvotes: 1

Related Questions