Pyr0Sh4rk
Pyr0Sh4rk

Reputation: 49

Regular Expression to match first word with a character in each line

I am trying to write a regex that finds the first word in each line that contains the character a. For a string like:

The cat ate the dog
and the mouse

The expression should find cat and
So far, I have:

/\b\w*a\w*\b/g

However this will return every match in each line, not just the first match (cat ate and).
What is the easiest way to only return the first occurrence?

Upvotes: 3

Views: 722

Answers (3)

Cary Swoveland
Cary Swoveland

Reputation: 110665

The text could be matched with the regular expression

(?=(\b[a-z]*a[a-z]*\b)).*\r?\n

with the multiline and case-indifferent flags set. For each match capture group 1 contains the first word (comprised only of letters) in a line that contains an "a". There are no matches in lines that do not contain an "a".

Demo

The expression can be broken down as follows.

(?=                # begin a positive lookahead
  \b               # match a word boundary
  ([a-z]*a[a-z]*)  # match a word containing an "a" and save to
                   # capture group 1
)
.*\r?\n            # match the remainder of the line including the
                   # line terminator 

Upvotes: 2

The fourth bird
The fourth bird

Reputation: 163207

If you want to match a word using \w you might also use a negated character class matching any character except a or a newline.

Then match a word that consists of at least an a char with word boundaries \b

^[^a\n\r]*\b([^\Wa]*a\w*)

The pattern matches:

  • ^ Start of string
  • [^a\n\r]*\b Optionally match any character except a or a newline
  • ( Capture group 1
    • [^\Wa]*a\w* Optionally match a word character without a, then match a and optional word characters
  • ) Close group 1

Regex demo

Using whitespace boundaries on the left and right:

^[^a\n\r]*(?<!\S)([^\Wa]*a\w*)(?!\S)

Regex demo

Upvotes: 2

JvdV
JvdV

Reputation: 75840

Assuming you are onluy looking for words without numbers and underscores (\w would include those), I'd advise to maybe use:

(?i)^.*?(?<!\S)([b-z]*a[a-z]*)(?!\S)

And use whatever is in the 1st capture group. See an online demo. Or, if supported:

(?i)^.*?\K(?<!\S)[b-z]*a[a-z]*(?!\S)

See an online demo.

Please note that I used lookaround to assert that the word is not inbetween anything other than whitespace characters. You may also use word-boundaries if you please and swap those lookarounds for \b. Also, depending on your application you can probably scratch the inline case-insensitive switch to a 'flag'. For example, if you happen to use JavaScript /^.*?(?<!\S)([b-z]*a[a-z]*)(?!\S)/gmi should probably be your option. See for example:

var myString = "The cat ate the dog\nand the mouse";
var myRegexp = new RegExp("^.*?(?<!\S)([b-z]*a[a-z]*)(?!\S)", "gmi");
m = myRegexp.exec(myString);
while (m != null) {
  console.log(m[1])
  m = myRegexp.exec(myString);
}

Upvotes: 2

Related Questions