rapt
rapt

Reputation: 12240

Perl regex: replacing optional parts of URL

I have a properties file with HTTP/database URLs such as the following:

http://localhost:8888/some_user?holiday=true
jdbc:hsqldb:hsql://localhost:9999/another_user?holiday=true&paid=true
jdbc:mysql://localhost:8888/some_user
http://localhost/some_user

Each URL appears in a separate line.

Each line can end with white spaces (spaces/tabs), \n, or nothing (if it's the last line).

The differences between lines:

I want to replace the port number (if exist) and the user name with XXXX.

For example, the previous URLs should become:

http://localhost:XXXX/XXXX?holiday=true
jdbc:hsqldb:hsql://localhost:XXXX/XXXX?holiday=true&paid=true
jdbc:mysql://localhost:XXXX/XXXX
http://localhost/XXXX

Here is what I have done:

I broke it down to two regular expressions... looks like I will have to if I want to use look arounds:

The second regex did not replace the user name... any idea what is wrong?

Also, is there a simple way to take into account the case when it's the last line, which may not end in a whitespace such as newline.

Upvotes: 0

Views: 283

Answers (2)

zdim
zdim

Reputation: 66964

Given the complexities of URL parsing in general, it is better to use the URI module.

Here is a two-pass regex. The first regex matches up to the first / or : (after the protocol identifier), followed by : and digits; the \K makes it drop all previous matches so only the port is replaced. The second regex replaces all consecutive not-? after the first /.

perl -ple'
    s{^ [^:]* :// [^/:]* : \K \d+ }{XXXX}x;
    s{^ [^:]* :// [^/]* \K [^?]* }{/XXXX}x;
' input > output

There is no need to run two one-liners since this goes strictly by-line. Corrected code from ikegami.


Update to the question update

The multiple protocols are correctly processed with the change of regexes' beginning to

s{^ .*? ://  ...

so to match anything up to, and including, the first ://. The rest is the same.

Upvotes: 2

melpomene
melpomene

Reputation: 85887

Why bother with a regex? The URI module can do it all for you:

perl -MURI -ple 'my $u = URI->new($_); $u->path("XXXX"); $u->_port("XXXX") if $u->_port; $_ = $u'

Upvotes: 3

Related Questions