Reputation: 12240
I have a properties file with HTTP/database URLs such as the following:
http://localhost:8888/some_user?holiday=true
jdbc:hsqldb:hsql://localhost:9999/another_user?holiday=true&paid=true
jdbc:mysql://localhost:8888/some_user
http://localhost/some_user
Each URL appears in a separate line.
Each line can end with white spaces (spaces/tabs), \n
, or nothing (if it's the last line).
The differences between lines:
I want to replace the port number (if exist) and the user name with XXXX
.
For example, the previous URLs should become:
http://localhost:XXXX/XXXX?holiday=true
jdbc:hsqldb:hsql://localhost:XXXX/XXXX?holiday=true&paid=true
jdbc:mysql://localhost:XXXX/XXXX
http://localhost/XXXX
Here is what I have done:
I broke it down to two regular expressions... looks like I will have to if I want to use look arounds:
Replace port numbers if exist:
perl -i -0777 -pe 's/(?<=localhost:)\d+/XXXX/g' file;
Then, replace user names:
perl -i -0777 -pe 's/(?<=localhost\/)(?<=localhost:XXXX\/)[\S&[^?]]*(?=[?\s\Z]?)/XXXX/g' file;
The second regex did not replace the user name... any idea what is wrong?
Also, is there a simple way to take into account the case when it's the last line, which may not end in a whitespace such as newline.
Upvotes: 0
Views: 283
Reputation: 66964
Given the complexities of URL parsing in general, it is better to use the URI module.
Here is a two-pass regex. The first regex matches up to the first /
or :
(after the protocol identifier), followed by :
and digits; the \K
makes it drop all previous matches so only the port is replaced. The second regex replaces all consecutive not-?
after the first /
.
perl -ple'
s{^ [^:]* :// [^/:]* : \K \d+ }{XXXX}x;
s{^ [^:]* :// [^/]* \K [^?]* }{/XXXX}x;
' input > output
There is no need to run two one-liners since this goes strictly by-line. Corrected code from ikegami.
Update to the question update
The multiple protocols are correctly processed with the change of regexes' beginning to
s{^ .*? :// ...
so to match anything up to, and including, the first ://
. The rest is the same.
Upvotes: 2
Reputation: 85887
Why bother with a regex? The URI module can do it all for you:
perl -MURI -ple 'my $u = URI->new($_); $u->path("XXXX"); $u->_port("XXXX") if $u->_port; $_ = $u'
Upvotes: 3