MPicazo
MPicazo

Reputation: 711

Regular Expression with no LookBehind feature

I am trying to write a regular expression that looks for all ';' characters that isn't followed by a NEW LINE (\n) character.

;(?!\\\n)

and all NEW LINE (\n) characters that is not preceded by a ';' character:

(?< !;)\\\n

Unfortunately I am using Qt 4.7.4 QRegExp and it does not support "Look Behind". How do I rewrite the regular expression above so it doesn't use "Look Behind"?

Upvotes: 1

Views: 1877

Answers (2)

Duncan
Duncan

Reputation: 1

Perl's lookbehind assertions, "independent" subexpressions and conditional expressions are not supported.

From http://doc.qt.io/archives/qt-4.8/qregexp.html

So the (?<;!;)\n does not work
and the (?!;)\n will match all new line characters
regardless of whether they are preceded by a ;

Upvotes: 0

phyatt
phyatt

Reputation: 19102

Quoting from the documentation:

http://doc.qt.digia.com/4.7/qregexp.html#details

Both zero-width positive and zero-width negative lookahead assertions (?=pattern) and (?!pattern) are supported with the same syntax as Perl.

What is probably happening is that you are running on a Windows machine that has inserted \r\n instead of just a \n... or maybe it was a text file created on a windows machine.

One thing to watch out for, that I found out for lookbehinds, is that you can't have a variable length lookbehind with most regex handlers out there.

If lookbehinds/lookaheads are still giving you trouble, the other option to look into is using capture groups, and then refer only to the capture group you are interested in.

From the code-examples section of the docs it has this:

str = "Nokia Corporation\tqt.nokia.com\tNorway";
QString company, web, country;
rx.setPattern("^([^\t]+)\t([^\t]+)\t([^\t]+)$");
if (rx.indexIn(str) != -1) {
    company = rx.cap(1);
    web = rx.cap(2);
    country = rx.cap(3);
}

A capture group is defined with parenthesis and is later access by its index starting at 1. The zeroth index is the entire match (not broken into capture groups).

http://doc.qt.digia.com/4.7/qregexp.html#cap

http://doc.qt.digia.com/4.7/qregexp.html#capturedTexts

Hope that helps. Regular Expressions can be a lot of fun when they are working right. Good luck.

I also like using this tool. The formatting may be a little different from QRegEx, but it is pretty quick to translate and test once you have it.

UPDATE: Here is a full suite, showing off 4 different capture strings and what they find with QRegEx:

#include <QCoreApplication>
#include <QRegExp>
#include <QString>
#include <QDebug>
#include <QStringList>

int main(int argc, char *argv[])
{
    QCoreApplication a(argc, argv);

    QString str =
            "This is a long string;\n"
            "with some semi colons;\n"
            "sometimes followed by a new line;\n"
            "and other times followed; by something else.\n"

            "(;)([^\\n]) find a semicolon and a new line\n"
            "(;)(?!\\n)  find a semicolon not followed by a new line, negative look-ahead\n"

            "([^;])(\\n) find a non semicolon and a new line\n"
            "(?<!;)(\\n) find a new line, not preceeded by a semicolon.\n";

    QList <QRegExp> rx_list;

    QRegExp rx_colon_and_non_newline;
    rx_colon_and_non_newline.setPattern("(;)([^\\n])");

    QRegExp rx_colon_and_neg_lookahead;
    rx_colon_and_neg_lookahead.setPattern("(;)(?!\\n)");

    QRegExp rx_non_colon_and_newline;
    rx_non_colon_and_newline.setPattern("([^;])(\\n)");

    QRegExp rx_neg_lookbehind_and_newline;
    rx_neg_lookbehind_and_newline.setPattern("(?<!;)(\\n)");

    rx_list << rx_colon_and_non_newline
            << rx_colon_and_neg_lookahead
            << rx_non_colon_and_newline
            << rx_neg_lookbehind_and_newline;

    foreach(QRegExp rx, rx_list)
    {
        int count = 0;
        int pos = 0;
        qDebug() << "Pattern" << rx.pattern();
        while ((pos = rx.indexIn(str, pos)) != -1) {
            QStringList capturedTexts(rx.capturedTexts());

            for(int i = 0; i<capturedTexts.size(); i++)
                capturedTexts[i].replace('\n',"\\n");

            qDebug() << "\t" << count << "Found at position" << pos << capturedTexts;
            // qDebug() << rx.cap();
            pos += rx.matchedLength();
            ++count;
        }
        if(count == 0)
            qDebug() << "\tNo matches found.";
    }


    return a.exec();
}

output:

Pattern "(;)([^\n])"
         0 Found at position 104 ("; ", ";", " ")
         1 Found at position 126 (";)", ";", ")")
         2 Found at position 169 (";)", ";", ")")
         3 Found at position 247 (";]", ";", "]")
         4 Found at position 295 (";)", ";", ")")
Pattern "(;)(?!\n)"
         0 Found at position 104 (";", ";")
         1 Found at position 126 (";", ";")
         2 Found at position 169 (";", ";")
         3 Found at position 247 (";", ";")
         4 Found at position 295 (";", ";")
Pattern "([^;])(\n)"
         0 Found at position 123 (".\n", ".", "\n")
         1 Found at position 166 ("e\n", "e", "\n")
         2 Found at position 242 ("d\n", "d", "\n")
         3 Found at position 289 ("e\n", "e", "\n")
         4 Found at position 347 (".\n", ".", "\n")
Pattern "(?<!;)(\n)"
        No matches found.

Upvotes: 1

Related Questions