MarkokraM
MarkokraM

Reputation: 990

RegEx - Select words that are not tag names or attributes

Is it possible to select all words, that are not tags and not inside tags as attributes? I have got this working inverse, and I know I could make this in two phases, replace first matches and make a new Javascript RegExp search. But thing is that I'd like to get it work with one expression.

http://regexr.com/3cb6g

(<[^>]*>)|({[^>]*})

Input:

<p>Test image captions for GitBook:</p>

<p>Second image: <img scr="./image2.png" alt="image title" title="image title">asdf</img>{caption width="300" style="height:'300px'"} </p>

<p>Sample text and first image: <img scr="./image1.png" alt="image 1" /> {caption width="300" style="height:'300px'"} for testing ok...</p>

Expected output marking words inside ` that should be matched:

<p>`Test` `image` `captions` `for` `GitBook`:</p>

<p>`Second` `image`: <img scr="./image2.png" alt="image title" title="image title">`asdf`</img>{caption width="300" style="height:'300px'"} </p>

<p>`Sample` `text` `and` `first` `image`: <img scr="./image1.png" alt="image 1" /> {caption width="300" style="height:'300px'"} `for` `testing` `ok`...</p>

Upvotes: 1

Views: 59

Answers (3)

guest271314
guest271314

Reputation: 1

Try using .textContent , String.prototype.replace() with RegExp /\{.*\}|:|\.+|\s{2}|\s$/gi

var p = document.getElementsByTagName("p"), res = [];
for (var text = "", i = 0; i < p.length; i++) {
  res[i] = p[i].textContent.replace(/\{.*\}|:|\.+|\s{2}|\s$/gi, "")
}
console.log(res)
<!--
<p>`Test` `image` `captions` `for` `GitBook`:</p>

<p>`Second` `image`: <img scr="./image2.png" alt="image title" title="image title">`asdf`</img>{caption width="300" style="height:'300px'"} </p>

<p>`Sample` `text` `and` `first` `image`: <img scr="./image1.png" alt="image 1" /> {caption width="300" style="height:'300px'"} `for` `testing` `ok`...</p>
-->
<p>Test image captions for GitBook:</p>

<p>Second image: <img scr="./image2.png" alt="image title" title="image title">asdf</img>{caption width="300" style="height:'300px'"} </p>

<p>Sample text and first image: <img scr="./image1.png" alt="image 1" /> {caption width="300" style="height:'300px'"} for testing ok...</p>

Upvotes: 0

MarkokraM
MarkokraM

Reputation: 990

My question might not have been too clear because answers were using javascript code to process matches. My purpose was to find solution with simple expression only. I finally found this expression that satisfies my needs:

((?!([^<]+)?>)([\w]+)(?!([^\{]+)?\})([\w]+))

http://regexr.com/3cb6j

Upvotes: 1

void
void

Reputation: 36703

You can try this:

var words = [];
$(function () {
  $("p").each(function () {
    words.concat($(this).text().split(" "));
  });
});

Now words array contains all the words.

Upvotes: 0

Related Questions