Reputation: 108
I am writing in C# and I have had success in the past with http://www.filehelpers.net/ for parsing text files. My file format has changed from a more standard .csv format, I now want to parse a file that looks like this:
custID: 1732
name: Juan Perez
balance: 435.00
date: 11-05-2002
custID: 554
name: Pedro Gomez
balance: 12342.30
date: 06-02-2004
How do I do parse such a file I cannot find an example of this, instead of delimiter I need to find the key word and then read the value given proceeding the ':'
Upvotes: 2
Views: 2564
Reputation: 31656
This is an example (see on .NetFiddle) and it will need to be tailored to the actual file in question. The usages of basic regular expressions can be done to parse files and then with Linq output what is parsed into a class instances. The following is a pattern to use to parse the data using regular expressions, and again you will need to modify it to your situation.
I am creating a target class (and not converting the strings to their ultimate needed format for this example) as such the Customer object:
public class Customer
{
public string Id { get; set; }
public string Name { get; set; }
public string Balance { get; set; }
public string Date { get; set; }
}
Here is our example data which matches the customer object and it simulates at this point that the data has been read from the file into a string:
string data = @"
custID: 1732
name: Juan Perez
balance: 435.00
date: 11-05-2002
custID: 554
name: Pedro Gomez
balance: 12342.30
date: 06-02-2004";
With the data and target entity in hand we will use a Regular Expression which maps out a pattern of how we want to parse the file. We will used named captures
the (?<NameHere> )
structures within the pattern to separate out the data from the headers for easier extraction (instead of indexing which is available).
string pattern = @"custID:(?<ID>[^\r\n]+)\s+name:(?<Name>[^\r\n]+)\s+balance:(?<Balance>[^\r\n]+)\s+date:(?<Date>[^\r\n]+)";
var KVPs = Regex.Matches(data, pattern)
.OfType<Match>()
.Select (mt => new Customer()
{
Id = mt.Groups["ID"].Value,
Name = mt.Groups["Name"].Value,
Balance = mt.Groups["Balance"].Value,
Date = mt.Groups["Date"].Value,
})
.ToList();
When this is run we get two class instances in a list which looks like this as run in LinqPad:
The data you are parsing appears to be an INI file. I discuss how to use Regular Expressions (advanced) to parse that info into a Dictionary for access INI Files Meet Regex and Linq in C# to Avoid the WayBack Machine of Kernal32.Dll.
Upvotes: 4
Reputation: 11326
It's not very easy to apply OmegaMan's solution to FileHelpers but the following may help you get started.
Assume for the time being, you only had one record. Then the following works:
[DelimitedRecord(":")]
public class ImportRecord
{
[FieldTrim(TrimMode.Both)]
public string Key;
[FieldTrim(TrimMode.Both)]
public string Value;
}
class Program
{
static void Main(string[] args)
{
var engine = new FileHelperEngine<ImportRecord>();
string fileAsString = @"custID: 1732" + Environment.NewLine +
@"name: Juan Perez" + Environment.NewLine +
@"balance: 435.00" + Environment.NewLine +
@"date: 11-05-2002" + Environment.NewLine;
ImportRecord[] validRecords = engine.ReadString(fileAsString);
var dictionary = validRecords.ToDictionary(r => r.Key, r => r.Value);
Assert.AreEqual(dictionary["custID"], "1732");
Assert.AreEqual(dictionary["name"], "Juan Perez");
Assert.AreEqual(dictionary["balance"], "435.00");
Assert.AreEqual(dictionary["date"], "11-05-2002");
Console.ReadKey();
}
}
But as soon as you have more than one record, you'll end up with duplicate dictionary entries and the above won't work. But there are ways around this. For instance if every record has the same number of lines (4 in your example, you can do something like this)
[DelimitedRecord(":")]
[IgnoreEmptyLines()]
public class ImportRecord
{
[FieldTrim(TrimMode.Both)]
public string Key;
[FieldTrim(TrimMode.Both)]
public string Value;
}
public class Customer
{
public string Id { get; set; }
public string Name { get; set; }
public string Balance { get; set; }
public string Date { get; set; }
}
class Program
{
static void Main(string[] args)
{
var engine = new FileHelperEngine<ImportRecord>();
string fileAsString =
@"custID: 1732
name: Juan Perez
balance: 435.00
date: 11-05-2002
custID: 554
name: Pedro Gomez
balance: 12342.30
date: 06-02-2004";
ImportRecord[] validRecords = engine.ReadString(fileAsString);
var customers = validRecords
.Batch(4, x => x.ToDictionary(r => r.Key, r => r.Value))
.Select(dictionary => new Customer()
{
Id = dictionary["custID"],
Name = dictionary["name"],
Balance = dictionary["balance"],
Date = dictionary["date"]
}).ToList();
Customer customer1 = customers[0];
Assert.AreEqual(customer1.Id, "1732");
Assert.AreEqual(customer1.Name, "Juan Perez");
Assert.AreEqual(customer1.Balance, "435.00");
Assert.AreEqual(customer1.Date, "11-05-2002");
Customer customer2 = customers[1];
Assert.AreEqual(customer2.Id, "554");
Assert.AreEqual(customer2.Name, "Pedro Gomez");
Assert.AreEqual(customer2.Balance, "12342.30");
Assert.AreEqual(customer2.Date, "06-02-2004");
Console.WriteLine("All OK");
Console.ReadKey();
}
}
}
Another alternative would be to pre-parse the content in order to transform it into a more conventional CSV file. That is, use File.ReadAllText()
to get a string
and then replace line feeds with a field delimiter and empty lines with a newline. Then read the transformed string with FileHelpersEngine.ReadAsString()
.
Upvotes: 2