Reputation: 775
I wish to separate the following data using regex function as follows:
Function to be used:
let fx=(text,regex)=>
Web.Page(
"<script>
var x='"&text&"';
var y=new RegExp('"®ex&"','g');
var b=x.match(y);
document.write(b);
</script>")[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
fx
Agurments:
regex - \\d+\\.?\\d+
This successfully extracts the numerical values, however:
M Code:
let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Invoked Custom Function1" = Table.AddColumn(#"Changed Type", "fnRegexExtr2", each fnRegexExtr2([Column1], "\\d+\\.?\\d+")),
#"Invoked Custom Function" = Table.AddColumn(#"Invoked Custom Function1", "fnRegexExtr2.1", each fnRegexExtr2([fnRegexExtr2], "\\D+"))
in
#"Invoked Custom Function"
update with "^[^\s]+":
data:
1200 mg/kg bw/day
24 mg/kg/day
0.79 mg/kg bw/day
15 ng/m3
15 ng/m 3
Not Limited
30mg/m³
Upvotes: 1
Views: 147
Reputation: 75860
There is no issue with the function itself, the problem lays with the patterns used:
Numeric part: You currently use \\d+\\.?\\d+
which basically means; "Any 1+ digit followed by an optional dot and at least another 1+ digits". So, at least two digits anywhere in a string. The appropriate regex would be:
^\\d+(?:\\.\\d+)?
Meaning:
^
- Start-line anchor;\\d+
- 1+ (Greedy) digits;(?:\\.\\d+)?
- Optional non-capture group to match a literal dot followed by at least 1+ more digits.Units: Your pattern [\D+][2]
matches a single character from the 1st class that is either a non-digit or a literal plus sign. Your 2nd character class matches a literal 2. So you are looking for patterns like 'A2' or '+2' etc. The appropriate regex would be dependend on your input.
Proposal:
In a previous answer I already suggested a different JS-based function, to replace data rather than to match data. Therefor add:
(x,y,z)=>
let
Source = Web.Page(
"<script>var x="&"'"&x&"'"&";var z="&"'"&z&
"'"&";var y=new RegExp('"&y&"','g');
var b=x.replace(y,z);document.write(b);</script>")
[Data]{0}[Children]{0}[Children]{1}[Text]{0}
in
Source
Now for both columns try pattern:
^(\\d+(?:\\.\\d+)?)?\\s*(.+)$
Replace the values with $1
for the numeric part, and with $2
for the leftover unit.
let
Source = Excel.CurrentWorkbook(){[Name="Tabel1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Invoked Custom Function1" = Table.AddColumn(#"Changed Type", "Nr", each fnRegexExtr([Column1], "^(\\d+(?:\\.\\d+)?)?\\s*(.+)$", "$1")),
#"Invoked Custom Function2" = Table.AddColumn(#"Invoked Custom Function1", "Unit", each fnRegexExtr([Column1], "^(\\d+(?:\\.\\d+)?)?\\s*(.+)$", "$2")),
#"Replaced Errors" = Table.ReplaceErrorValues(#"Invoked Custom Function2", {{"Nr", null}, {"Unit", null}})
in
#"Replaced Errors"
A 2nd option would be to replace the value with a delimiter where you later split on:
let
Source = Excel.CurrentWorkbook(){[Name="Tabel1"]}[Content],
#"Changed Type" = Table.TransformColumnTypes(Source,{{"Column1", type text}}),
#"Invoked Custom Function" = Table.AddColumn(#"Changed Type", "fnRegexExtr", each fnRegexExtr([Column1], "^(\\d+(?:\\.\\d+)?)?\\s*(.+)$", "$1|$2")),
#"Split Column by Delimiter" = Table.SplitColumn(#"Invoked Custom Function", "fnRegexExtr", Splitter.SplitTextByDelimiter("|", QuoteStyle.Csv), {"Nr", "Unit"})
in
#"Split Column by Delimiter"
Upvotes: 2
Reputation: 5
Thanks for posting. Looks like you need to match characters up the first space. For that you can use the following regex expression.
^[^\s]+
Upvotes: 0