Reputation: 91
I have the following RegEx (<th>Password<\/th>\s*<td>)\w*(<\/td>)
which matches <th>Password</th><td>root</td>
in this HTML:
<tr>
<th>Password</th>
<td>root</td>
</tr>
However this Terminal command fails to find a match:
perl -pi -w -e 's/(<th>Password<\/th>\s*<td>)\w*(<\/td>)/$1NEWPASSWORD$2/g' file.html
It appears to have something to do with the whitespace between the </th>
and <td>
but the <\/th>\s*<td>
works in the RegEx so why not in Perl?
Have tried substituting \s*
for \n*
, \r*
, \t*
and various combinations thereof but still no match.
Any help would be gratefully appreciated.
Upvotes: 2
Views: 190
Reputation: 126762
The substitution is only applied to one line of your file at a time.
You can read the entire file in at once using the -0
option, like this
perl -w -0777 -pi -e 's/(<th>Password<\/th>\s*<td>)\w*(<\/td>)/$1NEWPASSWORD$2/g' file.html
Note that it is far preferable to use a proper HTML parser, such as HTML::TreeBuilder::XPath
, to process data like this, as it is very difficult to account for all possible representations of a given HTML construct using regular expressions.
Upvotes: 3
Reputation: 98118
You could use sed to do this:
sed -i '/<th>Password<\/th>/{n;s!<td>[^<]*!<td>NEWPASSWORD!}' file.html
Another sed version:
sed -i '/<th>Password<\/th>/!b;n;s/<td>[^<]*/<td>NEWPASSWORD/' file.html
Upvotes: 2
Reputation: 1550
Perl evaluates a file one line at a time, in your example you're trying to match over two lines so perl never finds the end of the string it's looking for on the first line, and never finds the beginning of the line it's looking for on the second line.
You can either flatten file.html to a single line temporarily (which might work if the file's small / performance is not so important) or you'll need to write more sophisticated logic to keep track of lines it's found.
Try searching for 'multiline regex perl' :)
Upvotes: 2