screechOwl
screechOwl

Reputation: 28159

Azure / U-SQL - Regex substitution

I have some data containing spaces () and hyphens (-) that I'd like to convert to underscore characters (_). In other languages (ie R) I can write something like this:

var1 <- gsub(var1, "s+|\\-", "_")

which will look for multiple characters and convert them all to some other character.

Is there a way to do this in U-SQL?

EDIT:

I tried this and it ran without error but did not change the data:

@t2 = SELECT var1,
           var2,
           var3.Replace("s+|\\'|\\-","_") AS var3          
    FROM @t1;

Upvotes: 0

Views: 1353

Answers (2)

David Paul Giroux
David Paul Giroux

Reputation: 657

@someData =
SELECT * FROM
    ( VALUES
    ("tic tac-toe")
    ) AS T(col1);

DECLARE  @pattern string = "\\s|-";

@result =
SELECT  col1 AS original,
        Regex.Replace(col1,  "\\s",    "_") AS regex_replaceSpace,
        Regex.Replace(col1,  "-",    "_") AS regex_replaceHypen,
        Regex.Replace(col1,  "\\055",    "_") AS regex_replaceHypenDecimal,
        Regex.Replace(col1,  "\\s|-",    "_") AS regex_replaceBoth,
        Regex.Replace(col1,  @"\s|-",    "_") AS regex_replaceBoth_verbatim,
        Regex.Replace(col1,  @pattern,    "_") AS regex_replaceBoth_pattern,

        col1.Replace(" ", "_") AS string_replaceSpace,
        col1.Replace("-", "_") AS string_replaceHypen,
        col1.Replace("-", "_").Replace(" ", "_") AS string_replaceBoth
FROM @someData;

OUTPUT @result
TO "/Replace.csv"
USING Outputters.Csv(outputHeader: true);

Upvotes: 0

Peter Bons
Peter Bons

Reputation: 29780

You are almost there but you are using the System.String.Replace instead of the one for a regular expression. So change

@t2 = SELECT var1,
       var2,
       var3.Replace("s+|\\'|\\-","_") AS var3          
FROM @t1;

to

@t2 = SELECT var1,
       var2,
       Regex.Replace(var3, "s+|\\'|\\-", "_") AS var3          
FROM @t1;

Edit: I am not an expert on regular expression so I did not validate the expression itself.

Upvotes: 3

Related Questions