reddy
reddy

Reputation: 1821

Match an expression but don't match lines that start with a #

I am us Qt. I have a text string that I specifically look for a function call xyz.set_name(), I want to capture the last occurrence of this call but negate it if the line that contains it starts with a #. So far I got the regex to match the function call but I don't know how to negate the # matched lines and I don't know how to capture the last occurrence, don't know why all the matches are put into one capture group.

[().\w\d]+.set_name\(\)\s*

This is what I want it to do

abc.set_name() // match
# abc.set_name() // don't match
xyz.set_name() // match and capture this one

Update for more clarification:

My text read like this when printed out with qDebug

Hello\nx=y*2\nabc.set_name()   \n#xyz.set_name()

It's is a long string with \n being as newline.

Update: a longer test string for test. I have tried all the suggested regex on this but they didn't work. Don't know what is missing. https://regex101.com/r/vXpXIA/1

Update 2: Scratch my first update, the \n is a qDebug() thing, it doesn't need to be considered when using regex.

Upvotes: 1

Views: 293

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110725

If you merely want to match the last line that matches the pattern

^[a-z]+\.set_name\(\)

you can use the regular expression.

(?smi)^[a-z]+\.set_name\(\)(?!.*^[a-z]+\.set_name\(\))

For simplicity I've used the character class [a-z]. That can be changed to suit requirements. In the question it is [().\w\d], which can be simplified to [().\w].

Note that since the substring of interest is being matched there is no point to capturing it as well. The fact that one of the lines prior to the last one begins with '#' is not relevant. All that matters is whether the lines match a specified pattern.

Start your engine!

The PCRE regex engine performs the following operations.

(?smi)                  : set single-line, multi-line and case-indifferent
                          modes  
^                       : match the beginning of a line
[a-z]+\.set_name\(\)    : match 1+ chars in the char class, followed
                          by '.set_name\(\)'
(?!                     : begin negative-lookahead
.*^[a-z]+\.set_name\(\) : match 0+ chars (including newlines), the  
                          beginning of a line, 1+ letters, '\.set_name\(\)' 
)                       : end negative lookahead

Recall that single-line mode causes . to match newlines and multi-line mode causes ^ and $ to match the beginning and ends of lines (rather than the beginning and end of the string).

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 627087

You may use

(?s).*\n(?!\h*#)\h*([\w().]+\.set_name\(\))

See the regex demo, your match is in Group 1. Details:

  • (?s) - DOTALL mode on, . now matches any chars
  • .* - any zero or more chars as many as possible
  • \n(?!\h*#) - a newline that is not immediately followed with 0 or more horizontal whitespaces and then a # char
  • \h* - 0+ horizontal whitespaces
  • ([\w().]+\.set_name\(\)) - Capturing group 1:
    • [\w().]+ - 1 or more word chars, ), ( or .
    • \.set_name\(\) - a .set_name() string.

Upvotes: 0

Roy2511
Roy2511

Reputation: 1038

You need the regex lookahead operators (if your regex engine supports it). This will work.

(?(?=^[^#])(^\s*[a-zA-Z]+\.set_name\(\))|z^)

Explanation:

  • (?(?=patt)then|else) - Regex if-else construct, if regex matches given pattern patt, then is matched, otherwise else is matched

  • patt = ^[^#] -- at the start of the line, no #

  • then part - if patt is true -- ^\s*[a-zA-Z]*\.set_name\(\) matches any number of whitespace followed by <something>.set_name() where something is variable name.

  • else part -- If patt is false -- match z^ which is z coming before start of line, which isn't possible.


Edit: just realised you can have digits in variable names (but it cannot start with one). In that case, improved regex (not tested)

(?(?=^[^#])(^\s*[a-zA-Z]+[a-zA-Z\d]*\.set_name\(\))|z^)

Edit: Since you also have newline characters in your string, it doesn't match the problem description in your question. Nevertheless, simple enough to deal with by just tokenising the string.

Just split up the strings based on new line.

#include <iostream>
#include <string>
#include <sstream>
#include <vector>

int main()
{
    std::istringstream isr;
    isr.str("I am John\n today is  \n#abc.set_name()\n");
    std::string tok;
    std::vector<std::string> vs;
    while(std::getline(isr, tok))
    {
        std::cout << tok << std::endl;
        vs.push_back(tok);
    }
    
    for (auto r_it = vs.rbegin(); r_it != vs.rend(); ++r_it)
    {
        std::cout << *r_it << std::endl;
        // if match then break from loop
    }
}


Upvotes: 0

Related Questions