CodeAssasin
CodeAssasin

Reputation: 97

Split a concatenated field delimited by pipe

I have a field whose value is a concatenated set of fields delimited by | (pipe),

Note:- escape character is also a pipe.

Given:
AB|||1|BC||DE

Required:
["AB|","1","BC|DE"]

How can I split the given string into an array or list without iterating character by character (i.e. using regex or any other method) to get what is required?

Upvotes: 0

Views: 148

Answers (2)

Nathan Hughes
Nathan Hughes

Reputation: 96385

If there's an unused character you can substitute for the doubled-pipe you could do this:

groovy:000> s = "AB|||1|BC||DE"
===> AB|||1|BC||DE
groovy:000> Arrays.asList(s.replaceAll('\\|\\|', '@').split('\\|'))*.replaceAll(
'@', '|')
===> [AB|, 1, BC|DE]

Cleaned up with a magic char sequence and using tokenize it would look like:

pipeChars = 'ZZ' // or whatever
s.replaceAll('\\|\\|', pipeChars).tokenize('\\|')*.replaceAll(pipeChars, '|')

Of course this assumes that it's valid to go left-to-right across the string grouping the pipes into pairs, so each pair becomes a single pipe in the output, and the left-over pipes become the delimiters. When you start with something like

['AB|', '|1', 'BC|DE']

which gets encoded as

AB|||||1|BC||DE

then the whole encoding scheme falls apart, it's entirely unclear how to group the pairs of pipes in order to recover the original values. 'X|||||Y' could have been generated by ['X|','|Y'] or ['X||', 'Y'] or ['X', '||Y'], there is no way to know which it was.

Upvotes: 1

mikemil
mikemil

Reputation: 1241

How about using the split('|') method - but from what you provided, it looks like you can also have the '|' character in the field value. Any chance you can change the delimiter character to something that is not in the resulting values?

Upvotes: 0

Related Questions