M Sach
M Sach

Reputation: 34424

Ignoring the line break in regex?

I have below content in text file

  some texting content <img  src="cid:part123" alt=""> <b> Test</b>

I read it from file and store it in String i.e inputString

   expectedString = inputString.replaceAll("\\<img.*?cid:part123.*?>",
    "NewContent");

I get expected output i.e

     some texting content NewContent <b> Test</b>

Basically if there is end of line character in between img and src like below, it does not work for example below

 <img  
          src="cid:part123" alt="">

Is there a way regex ignore end of line character in between while matching?

Upvotes: 6

Views: 9955

Answers (3)

lc.
lc.

Reputation: 116538

By default, the . character will not match newline characters. You can enable this behavior by specifying the Pattern.DOTALL flag. In String.replaceAll(), you do this by attaching a (?s) to the front of your pattern:

expectedString = inputString.replaceAll("(?s)\\<img.*?cid:part123.*?>", 
    "NewContent");

See also Pattern.DOTALL with String.replaceAll

Upvotes: 3

Rohit Jain
Rohit Jain

Reputation: 213401

If you want your dot (.) to match newline also, you can use Pattern.DOTALL flag. Alternativey, in case of String.replaceAll(), you can add a (?s) at the start of the pattern, which is equivalent to this flag.

From the Pattern.DOTALL - JavaDoc : -

Dotall mode can also be enabled via the embedded flag expression (?s). (The s is a mnemonic for "single-line" mode, which is what this is called in Perl.)

So, you can modify your pattern like this: -

expectedStr = inputString.replaceAll("(?s)<img.*?cid:part123.*?>", "Content");

NOTE: - You don't need to escape your angular bracket(<).

Upvotes: 10

axtavt
axtavt

Reputation: 242786

You need to use Pattern.DOTALL mode.

replaceAll() doesn't take mode flags as a separate argument, but you can enable them in the expression as follows:

expectedString = inputString.replaceAll("(?s)\\<img.*?cid:part123.*?>", ...);

Note, however, that it's not a good idea to parse HTML with regular expressions. It would be better to use HTML parser instead.

Upvotes: 1

Related Questions