Reputation: 759
I have a text like:
Title A
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some random text
Title B
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some other random text
Title C
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some other random text
I want to parse the text based on the literal Status:
and get an array of items, each with title, description lines and status. I'm using C# 4.0.
Upvotes: 0
Views: 249
Reputation: 31721
string data = @"Title A
Status: Nothing But Net!
Title B
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some other random text
Title C
Can't stop the invisible Man
Credo Quia Absurdium Est
Status: C Status";
string pattern = @"
^(?:Title\s+)
(?<Title>[^\s]+)
(?:[\r\n\s]+)
(?<Description>.*?)
(?:^Status:\s*)
(?<Status>[^\r\n]+)
";
// Ignorepattern whitespace just allows us to comment the pattern over multiple lines.
Regex.Matches(data, pattern, RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace)
.OfType<Match>()
.Select (mt => new
{
Title = mt.Groups["Title"].Value,
Description = mt.Groups["Description"].Value.Trim(),
Status = mt.Groups["Status"].Value.Trim()
})
.ToList() // This is here just to do the display of the output
.ForEach(item => Console.WriteLine ("Title {0}: ({1}) and this description:{3}{2}{3}", item.Title, item.Status, item.Description, Environment.NewLine));
Output:
Title A: (Nothing But Net!) and this description:
Title B: (some other random text) and this description:
some description on a few lines, there may be empty lines here
some description on a few lines
Title C: (C Status) and this description:
Can't stop the invisible Man
Credo Quia Absurdium Est
Upvotes: 0
Reputation: 112782
Declare an Item type
public class Item
{
public string Title { get; set; }
public string Status { get; set; }
public string Description { get; set; }
}
Then split the text into lines
string[] lines = text.Split(new[] { "\r\n" }, StringSplitOptions.None);
Or read the lines from a file with
string[] lines = File.ReadAllLines(path);
Create a list of items where the result will be stored
var result = new List<Item>();
Now we can do the parsing
Item item;
for (int i = 0; i < lines.Length; i++) {
string line = lines[i];
if (line.StartsWith("Title ")) {
item = new Item();
result.Add(item);
item.Title = line.Substring(6);
} else if (line.StartsWith("Status: ")) {
item.Status = line.Substring(8);
} else { // Description
if (item.Description != null) {
item.Description += "\r\n";
}
item.Description += line;
}
}
Note that this solution has no error handling. This code assumes that the input text is always well-formed.
Upvotes: 1
Reputation: 59
If the content is structured like you describe, you can buffer the text
string myRegEx = "^String:.*$";
// loop through each line in text
if (System.Text.RegularExpressions.Regex.IsMatch(line, myRegEx))
{
// save the buffer into array
// clear the buffer
}
else
{
// save the text into the buffer
}
Upvotes: 1
Reputation: 2458
This is how I would do it (assuming it is read from text file):
Regex regStatus = new Regex(@"^Status:");
Regex regTitle = new Regex(@"^Title:");
string line;
string[] decriptionLine;
string[] statusLine;
string[] titleLine;
using(TextReader reader = File.OpenText("file.txt"))
{
while(reader.Peek() > 0)
{
line = reader.ReadLine();
if(regStatus.IsMatch(line))
{
// status line, convert to array, can drop first element as it is "status"
statusLine = line.Split(' ');
// do stuff with array
}
else if(regTitle.IsMatch(line))
{
// title line, convert to array can drop first element as it is "title"
titleLine = line.Split(' ');
// do stuff with array
}
else
{
// description line, so just split into array
decriptionLine = line.Split(' ');
// do stuff with array
}
}
}
You could then take the arrays and store it in some class if you wanted. I will leave that up to you to figure out. It just uses a simple regex to check if the line starts with "Status:" or "Title:". Truth be told, that isn't even needed. You could do something like:
if(line.StartsWith("Status:")) {}
if(line.StartsWith("Title:")) {}
To check if each line starts with Status or Title.
Upvotes: 1