Badr Hari
Badr Hari

Reputation: 8424

Regex pattern - ignore whitespace, line breaks, tabs etc

I have something like this:

<div class="wp-caption">
    <a href="https://">
        <img src="https://" alt="blabla">
    </a>
</div>

And I want to replace it with

<figure>
    <a href="https://">
        <img src="https://" alt="blabla">
    </a>
</figure>

I'm using regex pattern like this: Search for <div class="wp-caption">(.*)</div> Replace <figure>(.*)</figure>

Which works fine, but not when there are line breaks, spaces, tabs or some other formatting inside, how can I tell to ignore it?

I'm using Sublime Text.. what is using Perl style for expressions.

Upvotes: 2

Views: 7954

Answers (7)

Try this it will work

$variable =~ s!<div(?:\s+[^<>)?)?>(.*?)</div>!
           my $div_cont = $1;
           "<figure>".$div_cont."</figure>";
           !sge;

Upvotes: 0

Federico Piazza
Federico Piazza

Reputation: 31025

Additionally to Bohemian's answer if you don't want to use inline flags, then you can use a regex trick like this:

<div class="wp-caption">([\s\S]*?)</div>

Regular expression visualization

With the substitution string:

<figure>$1</figure>

Working demo

enter image description here

The trick is using [\s\S], meaning you will match spaces and non spaces multiple times (ie. everything).

Upvotes: 4

Bohemian
Bohemian

Reputation: 425288

This worked for me:

Find: (?s)<div class="wp-caption">(.*?)</div>
Replace: <figure>\1</figure>

The trick here is (?s) which makes dot match newlines, and (.*?) to capture non-greedily (will stop consuming at the next </div>) the contents of the <div> tag.

\1 is a back reference to the captured group 1.

Upvotes: 1

On sublime Ctrl+f(Grep)

Find:

<div[^>]*>([\s\S]*?)</div>

Replace:

<figure>\1</figure>

Upvotes: 0

Paco Esteban
Paco Esteban

Reputation: 36

This should work, at least for your example:

s/div(\sclass="wp-caption")?/figure$1/g;

Upvotes: 0

hata
hata

Reputation: 12503

How is using like this in Perl?

s/<div class="wp-caption">(.*?)<\/div>/<figure>$1<\/figure>/sg
  • regex replacement operation with s///
  • To avoid greedy match, using ? with .*
  • The captured string with (.*?) is stored in variable $1.
  • / inside regex can be escaped with \ (backslash).
  • s option means single line mode which ignores newlines.
  • g option means global match (all matches are processed).

Upvotes: 4

hjpotter92
hjpotter92

Reputation: 80657

Use the dotall modifier as follows:

(?s)<div[^>]+>(.*?)</div>

and replace with:

<figure>$1</figure>

Upvotes: 0

Related Questions