Lincoln
Lincoln

Reputation: 759

Regex to capture text around a literal

I have a text like:

Title A
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some random text
Title B
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some other random text
Title C
some description on a few lines, there may be empty lines here
some description on a few lines
Status: some other random text

I want to parse the text based on the literal Status: and get an array of items, each with title, description lines and status. I'm using C# 4.0.

Upvotes: 0

Views: 249

Answers (4)

ΩmegaMan
ΩmegaMan

Reputation: 31721

string data = @"Title A 


Status: Nothing But Net! 
Title B 
some description on a few lines, there may be empty lines here 
some description on a few lines 
Status: some other random text 
Title C 
Can't stop the invisible Man 
Credo Quia Absurdium Est
Status: C Status";

string pattern = @"
^(?:Title\s+)
 (?<Title>[^\s]+)
 (?:[\r\n\s]+)
 (?<Description>.*?)
  (?:^Status:\s*)
  (?<Status>[^\r\n]+)
";

// Ignorepattern whitespace just allows us to comment the pattern over multiple lines.
Regex.Matches(data, pattern, RegexOptions.Singleline | RegexOptions.Multiline | RegexOptions.IgnorePatternWhitespace)
     .OfType<Match>()
     .Select (mt => new
        {
            Title = mt.Groups["Title"].Value,
            Description = mt.Groups["Description"].Value.Trim(),
            Status = mt.Groups["Status"].Value.Trim()
        })
      .ToList() // This is here just to do the display of the output
      .ForEach(item => Console.WriteLine ("Title {0}: ({1}) and this description:{3}{2}{3}", item.Title, item.Status, item.Description, Environment.NewLine));

Output:

Title A: (Nothing But Net!) and this description:


Title B: (some other random text) and this description:
some description on a few lines, there may be empty lines here 
some description on a few lines

Title C: (C Status) and this description:
Can't stop the invisible Man 
Credo Quia Absurdium Est

Upvotes: 0

Olivier Jacot-Descombes
Olivier Jacot-Descombes

Reputation: 112782

Declare an Item type

public class Item
{
    public string Title { get; set; }
    public string Status { get; set; }
    public string Description { get; set; }
}

Then split the text into lines

string[] lines = text.Split(new[] { "\r\n" }, StringSplitOptions.None);

Or read the lines from a file with

string[] lines = File.ReadAllLines(path);

Create a list of items where the result will be stored

var result = new List<Item>();

Now we can do the parsing

Item item;
for (int i = 0; i < lines.Length; i++) {
    string line = lines[i];
    if (line.StartsWith("Title ")) {
        item = new Item();
        result.Add(item);
        item.Title = line.Substring(6);
    } else if (line.StartsWith("Status: ")) {
        item.Status = line.Substring(8);
    } else { // Description
        if (item.Description != null) {
            item.Description += "\r\n";
        }
        item.Description += line;
    }
}

Note that this solution has no error handling. This code assumes that the input text is always well-formed.

Upvotes: 1

harmonic
harmonic

Reputation: 59

If the content is structured like you describe, you can buffer the text

string myRegEx = "^String:.*$";

// loop through each line in text

    if (System.Text.RegularExpressions.Regex.IsMatch(line, myRegEx))
    {
        // save the buffer into array
        // clear the buffer
    }
    else
    {
        // save the text into the buffer
    }

Upvotes: 1

Jetti
Jetti

Reputation: 2458

This is how I would do it (assuming it is read from text file):

Regex regStatus = new Regex(@"^Status:");
Regex regTitle = new Regex(@"^Title:");
string line;
string[] decriptionLine;
string[] statusLine;
string[] titleLine;
using(TextReader reader = File.OpenText("file.txt"))
{
    while(reader.Peek() > 0)
    {
       line = reader.ReadLine();
       if(regStatus.IsMatch(line))
       {
          // status line, convert to array, can drop first element as it is "status"
          statusLine = line.Split(' '); 
          // do stuff with array
       }
       else if(regTitle.IsMatch(line))
       {
          // title line, convert to array can drop first element as it is "title"
          titleLine = line.Split(' ');
          // do stuff with array
       }
       else
       {
          // description line, so just split into array
          decriptionLine = line.Split(' ');
          // do stuff with array
       }
    }
}

You could then take the arrays and store it in some class if you wanted. I will leave that up to you to figure out. It just uses a simple regex to check if the line starts with "Status:" or "Title:". Truth be told, that isn't even needed. You could do something like:

if(line.StartsWith("Status:")) {} 
if(line.StartsWith("Title:")) {}

To check if each line starts with Status or Title.

Upvotes: 1

Related Questions