YuliaPro
YuliaPro

Reputation: 305

using TOKENIZE in PIG

I am trying to use the TOKENIZE function in PIG with a document that is comma separated. I would like to split on the commas, but NOT on white space. For example I would like for a list of (car, toy car, bunny) to be ((car), (toy car), (bunny) not ((car), (toy), (car), (bunny)). Is there a way to this?

Upvotes: 1

Views: 8844

Answers (2)

Debaditya
Debaditya

Reputation: 2497

Alternative way,

You can try with Flatten operator as well

Example:

Input -> (a,(b,c))

B = foreach A generate $0 , flatten ($1)

Output -> (a,b,c)

Use of Flatten and tokenize together

You can read the word count problem Here

Upvotes: 0

Romain
Romain

Reputation: 7082

Have you had a look to STRSPLIT for splitting just on the comma?

(it works for CHARARRAY like TOKENIZE)

Upvotes: 1

Related Questions