Parag Meshram
Parag Meshram

Reputation: 8521

Regex to extract line containing term

I've been using this regex

/(?:[^ .,;:]+[ .,;:]+){3}(?:term1|term2)(?:[ .,;:]+[^ .,;:]+){3}/gi

to extract selected terms and the preceding and succeeding 3 words. I'd like to change the regex so that I extract the line containing the selected terms. The line will be bounded by \n but I'd also like to trim leading and trailing spaces.
How do I alter the regex to do that?

example input:

   This line, containing  term2, I'd like to extract.  
        This line contains term13 and I'd like to ignore it  
  This line, on the other hand, contains term1, so let's keep it.

ouput would be

This line, containing  term2, I'd like to extract.
This line, on the other hand, contains term1, so let's keep it.

See code to be altered below.

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1" />
<title>Untitled Document</title>
</head>

<body>
<script>
var Input = "   This line, containing  term2, I'd like to extract."
Input += "        This line contains term13 and I'd like to ignore it."
Input += "  This line, on the other hand, contains term1, so let's keep it."

 var matches = Input.match(/(?:[^ .,;:]+[ .,;:]+){3}(?:term1|term2)(?:[ .,;:]+[^ .,;:]+){3}/gi);
 var myMatches = ""
  for (i=0;i<matches.length;i++)
  {
  myMatches += ("..." + matches[i] + "...\n"); //assign to variable
  }
  alert(myMatches)
</script>


</body>
</html>

Upvotes: 0

Views: 2564

Answers (1)

ohaal
ohaal

Reputation: 5268

Like Asad points out, you can use \b for word boundaries, that way term1 won't match term13 for instance.

The regex:

^ *(.*\b(?:term1|term2)\b.*) *$

Should do what you're after. Your matches will be in the first (and only) capture group. Simply loop through them and you're done.

See it on rubular.

Upvotes: 2

Related Questions