Reputation: 71
everyone, I find many examples about count words, but cannot find counting letters. I just want to split the words into letters, and count them, but my code is wrong. Can someone help me with this? Thanks very much. And this is my code:
A = load './in/*.txt';
B = FOREACH A GENERATE FLATTEN(TOKENIZE(LOWER((chararray)$0))) as words;
C = FOREACH B GENERATE FLATTEN(REGEX_EXTRACT_ALL(words, '([a-zA-Z])')) as letter;
D = group C by letter;
E = FOREACH D GENERATE COUNT(C), group;
DUMP E;
Upvotes: 4
Views: 1681
Reputation: 2328
Change your corresponding line as below:
C = foreach B generate flatten(TOKENIZE(REPLACE(words,'','|'), '|')) as letter;
The trick i have used is to replace each letter boundary with a special character(|) and then tokenize with that as delimiter. You can also use an uncommon string sequence instead of the special character.
Upvotes: 0