Andrey Minakov
Andrey Minakov

Reputation: 576

Regexp in Power Query using JavaScript

I need Regexp, while M / Power Query doesn't have native support to it. I found several variants of solution around the same excellent Web.Page & JavaScript idea - Biccauntant, Hugoberry.

I had to adopt them (the resulting code is below) due to JavaScript limits.

The main problem is that JavaScript has it's own tuff limit on the String variables - it is not possible to use "\" in it.

My variant tackles the problem, but I'm not professional in JS, so the main question is - I suspect that the "\" problem is not the only one.

So my question is - does anybody see some other problems with PQ & JavaScript "co-operation" while using the Regexp? Regarding other pitfalls with strings, or whatever?

I understand that Regexp has it's own escaping rules (this is not only JavaScript demand, but Regexp itself), so escaping of Regex itself is out of the scope of the question. I.e. regular expression should be properly escaped BEFORE it may be passed as the function parameter. In other words, it is supposed, that if a user wants to use regular expression with "\", he must use "\\" instead, and my function will convert it to "\\\\", which will be passed to JavaScript as Regexp expression.

For PQ users - it turned out that the performance of the solution is quite good for hundreds of rows at least. But not forget that it is not possible to use the function in Power BI service, this is for PowerBI Desktop and Excel only.

The code:

(text as nullable text, pattern as nullable text) as logical =>
    let 
        l = List.Transform({text, pattern}, each Text.Replace(_, "\", "\\")), 
        t = Text.Format("<script>document.write(new RegExp('#{1}').test('#{0}'))</script>", l),
        w = Web.Page(t), 
        d = w[Data]?{0}?[Children]?{0}?[Children]?{1}?[Text]?{0}?, 
        result = text <> null and (pattern = null or (if d <> null then Logical.FromText(d) else error "Regular expression or text are not supported by JavaScript."))
    in
        result

Upvotes: 6

Views: 2619

Answers (2)

GMCB
GMCB

Reputation: 337

Based on the code above (and correctly escaping single quotes '), I suggest the following functions (I include 3 functions, one for testing, one for matching and the other one for replacements).

NOTE: There is one issue that I am unable to fix. In the matching function the list returned will be incorrect if the string or matches contain content with a comma (','). I looked at using JSON.stringify() to output the match but it does not work, will open a question here and fix it later if possible.

Function to do regular expression testing

// regexTest
let   regexTest=(text as nullable text,replace as nullable text, optional flags as nullable text) as logical =>
    let
        f= if flags = null or flags ="" then "" else flags,
        l1 = List.Transform({text, replace}, each Text.Replace(_, "\", "\\")),
        l2 = List.Transform(l1, each Text.Replace(_, "'", "\'")),
        l3 = List.Combine({l2,{f}}),
        t = Text.Format("<script>document.write(new RegExp('#{1}','#{2}').test('#{0}'))</script>", l3),
        r = Web.Page(t)[Data]{0}[Children]{0}[Children]{1}[Text]{0},
        Output = Logical.From(r)
    in Output
    
in regexTest

Function to do regular expression matching

// regexMatch
let   regexMatch=(text as nullable text, pattern as nullable text, optional flags as nullable text) as list =>
    let
        f=if flags = null or flags ="" then "" else flags,
        l1 = List.Transform({text, pattern}, each Text.Replace(_, "\", "\\")),
        l2 = List.Transform(l1, each Text.Replace(_, "'", "\'")),
        t = Text.Format("<script>var txt='#{0}';document.write(txt.match(new RegExp('#{1}','#{2}')));</script>", List.Combine({l2,{f}})),
        r=Web.Page(t)[Data]{0}[Children]{0}[Children],
        Output=if List.Count(r)>1 then Text.SplitAny(r{1}[Text]{0},",") else {}
    in Output
in regexMatch

Function to do Regular expression replacing

// regexReplace
let   regexReplace=(text as nullable text,pattern as nullable text,replace as nullable text, optional flags as nullable text) as text =>
    let
        f=if flags = null or flags ="" then "" else flags,
        l1 = List.Transform({text, pattern, replace}, each Text.Replace(_, "\", "\\")),
        l2 = List.Transform(l1, each Text.Replace(_, "'", "\'")),
        t = Text.Format("<script>var txt='#{0}';document.write(txt.replace(new RegExp('#{1}','#{3}'),'#{2}'));</script>", List.Combine({l2,{f}})),
        r=Web.Page(t)[Data]{0}[Children]{0}[Children],
        Output=if List.Count(r)>1 then r{1}[Text]{0} else ""
    in Output
in regexReplace

Upvotes: 4

David Knipe
David Knipe

Reputation: 3454

I know javascript, but I don't know Powerbi/Powerquery so this answer may be incomplete.

There are other characters which need to be escaped in string literals.

For single-quoted string literals, \ and ' need to be escaped, as do some whitespace characters such as newline. There's a list of escapes at https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/String . Some of these are optional, but I'm not sure exactly which ones. I don't think you ever need any of the ones with hex digits. I said that ' needs to be escaped but not ", but if you were using a double-quoted string ("#{0}") it would be the other way round.

Apart from that, I can't tell you what escaping needs to be done for Powerquery. In fact I'm surprised that your string literal "\\" worked as intended.

In summary, it should look something like this (and feel free to correct my Powerquery syntax errors):

// Define function escapeString which inserts \ before each \ or ' and then applies the escape sequences \n, \r, \v, \t, \b and \f.
let 
    l = List.Transform({text, pattern}, each escapeString(_)), 
...

Upvotes: 3

Related Questions