gextra
gextra

Reputation: 8885

How to convert a wildcard pattern to regex in Erlang?

Wildcard patterns are file system standards that match any characters to a ( ? ) and any sequence characters to an ( * ).

I am trying to use the erlang re:replace/3 function to replace:

a) * into .*

b) ? into .

c) . into \.

d) if a wildcard pattern does not start in a wildcard, then add a ^ (start-match in regex) to the end of the pattern

e) if a wildcard pattern does not end in a wildcard, then add a $ (end-match in regex) to the end of the pattern

Somehow I am unable to get the re:replace to achieve this.

Examples:

trying to replace based on item a) above

re:replace("something*.log","\*","\.\*").
exception error: bad argument

Upvotes: 0

Views: 1040

Answers (2)

legoscia
legoscia

Reputation: 41568

In your re:replace call:

re:replace("something*.log","\*","\.\*").

the backslashes don't actually end up in the strings, since they just escape the following characters. Some backslash escapes have special meanings, such as "\n" meaning a newline, but the ones that don't only let the character through unchanged:

4> "\*".
"*"

So you need a double backslash for the backslash to actually be part of the string:

5> re:replace("something*.log","\\*","\.\*").
[<<"something">>,<<".*">>|<<".log">>]

Note that the backslashes in "\.\*" are not needed.

The return value above is an iolist, which is usually perfectly useful (in particular if you want to write the result to a file or a socket), but sometimes you may want a plain string at additional memory and CPU cost. You can pass a fourth argument to re:replace:

7> re:replace("something*.log","\\*","\.\*", [{return, list}]).   
"something.*.log"

Upvotes: 0

Pascal
Pascal

Reputation: 14042

If you are confident in the completeness of your spec, you can write the conversion directly (I guess there is no performance problem because regular expression are generally short list)

-module(rep).
-compile([export_all]).

replace(L) when is_list(L) -> lists:reverse(replace(L,wildcard(hd(L)))).

% take care of the first character
replace(L,W={true,_}) -> replace(L,W,[]);
replace(L,W={false,_}) -> replace(L,W,[$^]).

% take care of the last character
replace([_],{true,R},Res) -> R ++ Res;
replace([_],{false,R},Res) -> [$$|R] ++ Res;
% middle characters
replace([_|Q],{_,R},Res) -> replace(Q,wildcard(hd(Q)),R++Res).

wildcard($*) -> {true,[$*,$.]};
wildcard($?) -> {true,[$.]};
wildcard($.) -> {true,[$.,$\\]};
wildcard(C) -> {false,[C]}.

with your example:

11> rep:replace("something*.log").
"^something.*\\.log$"

Note that the \\ is one single character as you can verify with:

12> length(rep:replace("something*.log")).
18

Upvotes: 1

Related Questions