Nick
Nick

Reputation: 775

Extract first integers/decimals from units/string using regex in excel

I wish to separate the following data using regex function as follows:

enter image description here

Function to be used:

let   fx=(text,regex)=>
    Web.Page(
        "<script>
            var x='"&text&"';
            var y=new RegExp('"&regex&"','g');
            
            var b=x.match(y);
            document.write(b);
        </script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}

in
fx

Agurments:

regex - \\d+\\.?\\d+

This successfully extracts the numerical values, however:

  1. I am unsure if this is the correct regex to remove the first integer/number.
  2. I am unsure how to use regex to extract only the units. It seems to run into errors despite various attempts. e.g. \D+ doesn't return the non-numerical values despite it working on the link. This being said for 15 ng/m3 if this did work it would only return ng/m3. I wonder if there is an issue with the function itself.

M Code:

let
    Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Invoked Custom Function1" = Table.AddColumn(#"Changed Type", "fnRegexExtr2", each fnRegexExtr2([Column1], "\\d+\\.?\\d+")),
    #"Invoked Custom Function" = Table.AddColumn(#"Invoked Custom Function1", "fnRegexExtr2.1", each fnRegexExtr2([fnRegexExtr2], "\\D+"))
in
    #"Invoked Custom Function"

update with "^[^\s]+":

enter image description here

data:

1200 mg/kg bw/day
24 mg/kg/day
0.79 mg/kg bw/day
15 ng/m3
15 ng/m 3
Not Limited
 30mg/m³

Upvotes: 1

Views: 147

Answers (2)

JvdV
JvdV

Reputation: 75860

There is no issue with the function itself, the problem lays with the patterns used:


Numeric part: You currently use \\d+\\.?\\d+ which basically means; "Any 1+ digit followed by an optional dot and at least another 1+ digits". So, at least two digits anywhere in a string. The appropriate regex would be:

^\\d+(?:\\.\\d+)?

Meaning:

  • ^ - Start-line anchor;
  • \\d+ - 1+ (Greedy) digits;
  • (?:\\.\\d+)? - Optional non-capture group to match a literal dot followed by at least 1+ more digits.

Units: Your pattern [\D+][2] matches a single character from the 1st class that is either a non-digit or a literal plus sign. Your 2nd character class matches a literal 2. So you are looking for patterns like 'A2' or '+2' etc. The appropriate regex would be dependend on your input.


Proposal:

In a previous answer I already suggested a different JS-based function, to replace data rather than to match data. Therefor add:

(x,y,z)=>
let 
   Source = Web.Page(
                     "<script>var x="&"'"&x&"'"&";var z="&"'"&z&
                     "'"&";var y=new RegExp('"&y&"','g');
                     var b=x.replace(y,z);document.write(b);</script>")
                     [Data]{0}[Children]{0}[Children]{1}[Text]{0}
in 
   Source

Now for both columns try pattern:

^(\\d+(?:\\.\\d+)?)?\\s*(.+)$

Replace the values with $1 for the numeric part, and with $2 for the leftover unit.

enter image description here

let
    Source = Excel.CurrentWorkbook(){[Name="Tabel1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Invoked Custom Function1" = Table.AddColumn(#"Changed Type", "Nr", each fnRegexExtr([Column1], "^(\\d+(?:\\.\\d+)?)?\\s*(.+)$", "$1")),
    #"Invoked Custom Function2" = Table.AddColumn(#"Invoked Custom Function1", "Unit", each fnRegexExtr([Column1], "^(\\d+(?:\\.\\d+)?)?\\s*(.+)$", "$2")),
    #"Replaced Errors" = Table.ReplaceErrorValues(#"Invoked Custom Function2", {{"Nr", null}, {"Unit", null}})
in
    #"Replaced Errors"

A 2nd option would be to replace the value with a delimiter where you later split on:

let
    Source = Excel.CurrentWorkbook(){[Name="Tabel1"]}[Content],
    #"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
    #"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "fnRegexExtr", each fnRegexExtr([Column1], "^(\\d+(?:\\.\\d+)?)?\\s*(.+)$", "$1|$2")),
    #"Split Column by Delimiter" = Table.SplitColumn(#"Invoked Custom Function", "fnRegexExtr", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Nr", "Unit"})
in
    #"Split Column by Delimiter"

Upvotes: 2

Bumbl3b33
Bumbl3b33

Reputation: 5

Thanks for posting. Looks like you need to match characters up the first space. For that you can use the following regex expression.

^[^\s]+

Upvotes: 0

Related Questions