Reputation: 91

Working RegEx that fails in Perl find & replace one-liner

I have the following RegEx (<th>Password<\/th>\s*<td>)\w*(<\/td>) which matches <th>Password</th><td>root</td> in this HTML:

<tr>
    <th>Password</th>
    <td>root</td>
</tr>

However this Terminal command fails to find a match:

perl -pi -w -e 's/(<th>Password<\/th>\s*<td>)\w*(<\/td>)/$1NEWPASSWORD$2/g' file.html

It appears to have something to do with the whitespace between the </th> and <td> but the <\/th>\s*<td> works in the RegEx so why not in Perl?

Have tried substituting \s* for \n*, \r*, \t* and various combinations thereof but still no match.

Any help would be gratefully appreciated.

Upvotes: 2

Answers (3)

Borodin

Reputation: 126762

The substitution is only applied to one line of your file at a time.

You can read the entire file in at once using the -0 option, like this

perl -w -0777 -pi -e 's/(<th>Password<\/th>\s*<td>)\w*(<\/td>)/$1NEWPASSWORD$2/g' file.html

Note that it is far preferable to use a proper HTML parser, such as HTML::TreeBuilder::XPath, to process data like this, as it is very difficult to account for all possible representations of a given HTML construct using regular expressions.

Upvotes: 3

perreal

Reputation: 98118

You could use sed to do this:

 sed -i '/<th>Password<\/th>/{n;s!<td>[^<]*!<td>NEWPASSWORD!}' file.html

Another sed version:

 sed -i '/<th>Password<\/th>/!b;n;s/<td>[^<]*/<td>NEWPASSWORD/' file.html

Upvotes: 2

Douglas Adams

Reputation: 1550

Perl evaluates a file one line at a time, in your example you're trying to match over two lines so perl never finds the end of the string it's looking for on the first line, and never finds the beginning of the line it's looking for on the second line.

You can either flatten file.html to a single line temporarily (which might work if the file's small / performance is not so important) or you'll need to write more sophisticated logic to keep track of lines it's found.

Try searching for 'multiline regex perl' :)

Upvotes: 2

Working RegEx that fails in Perl find &amp; replace one-liner

Answers (3)

Related Questions

Working RegEx that fails in Perl find & replace one-liner