user7988893
user7988893

Reputation:

To match all characters not ending with specified string

echo "xxabc jkl" | grep -onP  '\w+(?!abc\b)'
1:xxabc
1:jkl

Why the result is not as below?

echo "xxabc jkl" | grep -onP  '\w+(?!abc\b)'
1:jkl

The first string is xxabc which ending with abc.
I want to extract all characters which not ending with abc,why xxabc matched?
How to fix it,that is to say get only 1:jkl as output?
Why '\w+(?!abc\b)' can't work?

Upvotes: 2

Views: 822

Answers (2)

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

Without pcregrep special features, you can do it adding a pipe to sed:

echo "xxabc jkl" | sed 's/[a-zA-Z]*abc//g' | grep -onE '[a-zA-Z]+'

or with awk:

echo "xxabc jkl" | awk -F'[^a-zA-Z]+' '{for(i=1;i<=NF;i++){ if ($i!~/abc$/) printf "%s: %s\n",NR,$i }}'

other approach:

echo "xxabc jkl" | awk -F'([^a-zA-Z]|[a-zA-Z]*abc\\>)+' '{OFS="\n"NR": ";if ($1) printf OFS;$1=$1}1'

Upvotes: 1

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626806

The \w+(?!abc\b) pattern matches xxabc because \w+ matches 1 or more word chars greedily, and thus grabs xxabc at once. Then, the negative lookahead (?!abc\b) makes sure there is no abc with a trailing word boundary immediately to the left of the current location. Since after xxabc there is no abc with a trailing word boundary, the match succeeds.

To match all words that do not end with abc using a PCRE regex, you may use

echo "xxabc jkl" | grep -onP  '\b\w+\b(?<!abc)'

See the online demo

Details

  • \b - a leading word boundary
  • \w+ - 1 or more word chars
  • \b - a trailing word boundary
  • (?<!abc) - a negative lookbehind that fails the match if the 3 letters immediately to the left of the current location are abc.

Upvotes: 1

Related Questions