Reputation: 37
Lets say I have the following html
<b>Item 1</b> Text <br>
<b>Item 2</b> Text <br>
<b>Item 3</b> Text <br>
<p><font color="#000000" face="Arial, Helvetica, sans-serif"><b>Item 4:</b></font></p>
<p><font color="#000000" face="Arial, Helvetica, sans-serif">Detailed Description</font></p>
and am using the following regex to capture data (Item 1:.*?<br>)/gi
which returns <b>Item 1</b> Text <br>
How do i drop or remove the <b>,</b> and <br>
to be left with
Item 1 Text
I've been trying to make sense of this code <(\w+)[^>]*>.*<\/\1>
, but so far no luck. All the examples I have seen on here seem to require an id class, which my html does not have so i'm a bit stuck in getting those examples to fit my problem.
Upvotes: 1
Views: 1584
Reputation: 896
in a regex, what is between () represents capture groups that can be later accessed as variables (\1 \2 \3 etc.) or sometimes $1 $2 $3. So simply use them to capture the text you want.
I think this regex would work for you:
<b>(Item \d+)</b>(.*?)<br>
in details, the expression means:
(Item \d+)
: Any string formatted as "Item [at least 1 digit]"(.*?)
: any group of characters, the ?
minimizes the number of characters in the sequence.So now in <b>Item 5434</b>hel34lo 0345 345<br>
, with regex above your captured groups are:
\1
= Item 5434\2
= hel34lo 0345 345I've never programmed in javascript, but more precisely, this piece of code might work:
var myString = "<b>Item 5434</b>hel34lo 0345 345<br>";
var myRegexp = /<b>(Item \d+)</b>(.*?)<br>/g;
var match = myRegexp.exec(myString);
alert(match[1]); // Item 5434
alert(match[2]); // hel34lo 0345 345
Upvotes: 0
Reputation: 3675
This regex will match b and br tags:
</?br?\s*/?>
To use it in Javascript you write something like this:
result = subject.replace(/<\/?br?\s*\/?>/img, "");
All the matched tags will be replaced with an empty string.
In my experience it is better to replace br tags with a space and replace normal inline tags with empty string. If that is what you want to do, this next regex matches only b tags:
</?b\s*/?>
and this one matches only br tags:
</?br\s*/?>
Upvotes: 1
Reputation: 4880
This should do the trick:
var matches = stringToTest.match(/(Item \d+.*?<br\/?>)/gi);
for (var i = 0; i < matches.length; i++) {
matches[i] = matches[i].replace(/<[^>]+>/g, '');
}
alert(matches);
If you have jQuery:
alert(
$.map(stringToTest.match(/(Item \d+.*?<br\/?>)/gi), function(v) { return v.replace(/<[^>]+>/g, '') })
);
Upvotes: 1
Reputation: 1866
Try this reg ex: <[^>]*>
This will remove all the html with or without attributes and closing tags.
Upvotes: 3