Syam Kumar
Syam Kumar

Reputation: 383

Can I split a column text as array using data factory data flow?

Inside my data flow pipeline I would like to add a derived column and its datatype is array. I would like to split the existing column with 1000 characters without breaking words. I think we can use regexSplit,

regexSplit(<string to split> : string, <regex expression> : string) => array

But I do not know which regular expression I can use for split the existing column without breaking words. Please help me to figure it out.

Upvotes: 3

Views: 4034

Answers (2)

Syam Kumar
Syam Kumar

Reputation: 383

I created a workaround for this and it works fine for me.

filter(split(regexReplace(regexReplace(text, `[\t\n\r]`, ``), `(.{1,1000})(?:\s|$)`, `$1~~`), '~~'), #item !="")

I think, we have a better solution than this.

Upvotes: 1

gijswijs
gijswijs

Reputation: 2148

I wouldn't use a regex for this, but a truncating function like this one, courtesy of TimS:

public static string TruncateAtWord(this string input, int length)
{
    if (input == null || input.Length < length)
        return input;
    int iNextSpace = input.LastIndexOf(" ", length, StringComparison.Ordinal);
    return string.Format("{0}…", input.Substring(0, (iNextSpace > 0) ? iNextSpace : length).Trim());
}

Translated into expression functions it would look something* like this.

substring(Input, 1, iif(locate(Input, ' ', 1000) > 0, locate(Input, ' ', 1000) , length(Input)) )

Since you don't have a lastIndexOf available as an expression function, you would have to default to locate, which means that this expression truncates the string at the first space after the 1000th character.

*I don't have an environment where I can test this.

Upvotes: 0

Related Questions