Bulki
Bulki

Reputation: 741

C# - find a line (regex) in a file and get the complete block of text according to another regex

The problem is this:

I want to find a regex in a textfile and get the complete block of text

Text example:

text text text text text text text text text 
!
title
text text text text text text text text text text text text text text text 
text text text text text text text text text text text text text text text 
text text text text text text text text text text text text text text text 
!
text text text text text text text text text 

finding the "title" part is easy but I want to get the following result:

title
text text text text text text text text text text text text text text text 
text text text text text text text text text text text text text text text 
text text text text text text text text text text text text text text text 

What is the best way to go? Working with a regex pattern or selecting the text until I get a "!"? (I want to have simple/fast readable code)

Code for finding a pattern: (with rtxtText as the richtextbox)

    private String searchInfo(String pattern)
    {
        String text = rtxtText.Text;
        Regex regExp = new Regex(pattern);
        String result = "";

        foreach (Match match in regExp.Matches(text))
        {
            result += "\n" + match.ToString();
        }
        return result; 
    }

Upvotes: 2

Views: 7995

Answers (3)

Sergey Berezovskiy
Sergey Berezovskiy

Reputation: 236248

public IEnumerable<string> ParseParagraphs(string text)
{
    Regex regex = new Regex(@"title[^!]*");
    foreach (Match match in regex.Matches(text))
        yield return match.Value;  
}

Usage is simple:

foreach (var p in ParseParagraphs(your_text))
    Console.WriteLine(p);

UPDATE: Use StringBuilder in your SearchInfo method to avoid creating many strings in memory

private string SearchInfo(String pattern)
{            
    MatchCollection matches = Regex.Matches(rtxtText.Text, pattern);
    if (matches.Count == 0)
        return String.Empty;

    StringBuilder sb = new StringBuilder();
    foreach (Match match in matches)
        sb.AppendLine(match.Value);

    return sb.ToString();
}

And call it this way var result = SearchInfo(@"title[^!]*");

Upvotes: 1

Hinek
Hinek

Reputation: 9729

Your Regex be changed to contain the unknown characters as well, like

  • first title
  • then [^!]* ([^ ] means something not in this set, so [^!]* is everything except ! in any number)

    Regex regex = new Regex("title[^!]*", RegexOptions.SingleLine); MatcheCollection matches = regex.Matches(text);

Upvotes: 4

Daren Thomas
Daren Thomas

Reputation: 70324

The best way is just to loop over the lines of text until you find the first '!' and then collect until you find the next:

line = textfile.readline()
while line and line.strip() != '!'
    line = textfile.readline() # skip until first '!'
title = textfile.readline() # now on title line
text = ''
line = textfile.readline()
while line and line.strip() != '!'
    text += line
    line = textfile.readline()
print title
print text

Upvotes: 1

Related Questions