mx5_craig
mx5_craig

Reputation: 108

How to create an object after parsing a flat text file

I am writing in C# and I have had success in the past with http://www.filehelpers.net/ for parsing text files. My file format has changed from a more standard .csv format, I now want to parse a file that looks like this:

custID: 1732
name: Juan Perez
balance: 435.00
date: 11-05-2002

custID: 554
name: Pedro Gomez
balance: 12342.30
date: 06-02-2004

How do I do parse such a file I cannot find an example of this, instead of delimiter I need to find the key word and then read the value given proceeding the ':'

Upvotes: 2

Views: 2564

Answers (2)

ΩmegaMan
ΩmegaMan

Reputation: 31656

This is an example (see on .NetFiddle) and it will need to be tailored to the actual file in question. The usages of basic regular expressions can be done to parse files and then with Linq output what is parsed into a class instances. The following is a pattern to use to parse the data using regular expressions, and again you will need to modify it to your situation.


I am creating a target class (and not converting the strings to their ultimate needed format for this example) as such the Customer object:

public class Customer
{
    public string Id { get; set; }
    public string Name { get; set; }
    public string Balance { get; set; }
    public string Date { get; set; }
}

Here is our example data which matches the customer object and it simulates at this point that the data has been read from the file into a string:

string data = @"
custID: 1732 
name: Juan Perez
balance: 435.00
date: 11-05-2002

custID: 554
name: Pedro Gomez
balance: 12342.30
date: 06-02-2004";

With the data and target entity in hand we will use a Regular Expression which maps out a pattern of how we want to parse the file. We will used named captures the (?<NameHere> ) structures within the pattern to separate out the data from the headers for easier extraction (instead of indexing which is available).

string pattern = @"custID:(?<ID>[^\r\n]+)\s+name:(?<Name>[^\r\n]+)\s+balance:(?<Balance>[^\r\n]+)\s+date:(?<Date>[^\r\n]+)";

var KVPs = Regex.Matches(data, pattern)
                .OfType<Match>()
                .Select (mt => new Customer()
                {
                    Id = mt.Groups["ID"].Value,
                    Name = mt.Groups["Name"].Value,
                    Balance = mt.Groups["Balance"].Value,
                    Date    = mt.Groups["Date"].Value,
                })
                .ToList();

When this is run we get two class instances in a list which looks like this as run in LinqPad:

enter image description here


The data you are parsing appears to be an INI file. I discuss how to use Regular Expressions (advanced) to parse that info into a Dictionary for access INI Files Meet Regex and Linq in C# to Avoid the WayBack Machine of Kernal32.Dll.

Upvotes: 4

shamp00
shamp00

Reputation: 11326

It's not very easy to apply OmegaMan's solution to FileHelpers but the following may help you get started.

Assume for the time being, you only had one record. Then the following works:

[DelimitedRecord(":")]
public class ImportRecord
{
    [FieldTrim(TrimMode.Both)]
    public string Key;
    [FieldTrim(TrimMode.Both)]
    public string Value;
}

class Program
{
    static void Main(string[] args)
    {
        var engine = new FileHelperEngine<ImportRecord>();

        string fileAsString = @"custID: 1732" + Environment.NewLine +
                              @"name: Juan Perez" + Environment.NewLine +
                              @"balance: 435.00" + Environment.NewLine +
                              @"date: 11-05-2002" + Environment.NewLine;

        ImportRecord[] validRecords = engine.ReadString(fileAsString);

        var dictionary = validRecords.ToDictionary(r => r.Key, r => r.Value);

        Assert.AreEqual(dictionary["custID"], "1732");
        Assert.AreEqual(dictionary["name"], "Juan Perez");
        Assert.AreEqual(dictionary["balance"], "435.00");
        Assert.AreEqual(dictionary["date"], "11-05-2002");

        Console.ReadKey();
    }
}

But as soon as you have more than one record, you'll end up with duplicate dictionary entries and the above won't work. But there are ways around this. For instance if every record has the same number of lines (4 in your example, you can do something like this)

[DelimitedRecord(":")]
[IgnoreEmptyLines()]
public class ImportRecord
{
    [FieldTrim(TrimMode.Both)]
    public string Key;
    [FieldTrim(TrimMode.Both)]
    public string Value;
}

public class Customer
{
    public string Id { get; set; }
    public string Name { get; set; }
    public string Balance { get; set; }
    public string Date { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        var engine = new FileHelperEngine<ImportRecord>();

        string fileAsString =
@"custID: 1732
name: Juan Perez
balance: 435.00
date: 11-05-2002

custID: 554
name: Pedro Gomez
balance: 12342.30
date: 06-02-2004";

        ImportRecord[] validRecords = engine.ReadString(fileAsString);

        var customers = validRecords
            .Batch(4, x => x.ToDictionary(r => r.Key, r => r.Value))
            .Select(dictionary => new Customer()
                {
                    Id = dictionary["custID"],
                    Name = dictionary["name"],
                    Balance = dictionary["balance"],
                    Date = dictionary["date"]
                }).ToList();

        Customer customer1 = customers[0];
        Assert.AreEqual(customer1.Id, "1732");
        Assert.AreEqual(customer1.Name, "Juan Perez");
        Assert.AreEqual(customer1.Balance, "435.00");
        Assert.AreEqual(customer1.Date, "11-05-2002");

        Customer customer2 = customers[1];
        Assert.AreEqual(customer2.Id, "554");
        Assert.AreEqual(customer2.Name, "Pedro Gomez");
        Assert.AreEqual(customer2.Balance, "12342.30");
        Assert.AreEqual(customer2.Date, "06-02-2004");

        Console.WriteLine("All OK");
        Console.ReadKey();
    }
}

}

Another alternative would be to pre-parse the content in order to transform it into a more conventional CSV file. That is, use File.ReadAllText() to get a string and then replace line feeds with a field delimiter and empty lines with a newline. Then read the transformed string with FileHelpersEngine.ReadAsString().

Upvotes: 2

Related Questions