Tokenizing using boost gives out the same output

Question

I want to tokenize a lot of Burmese text. So I tried using boost tokenizer.

The text that I was trying with is ျခင္းခတ္ခဲ့တာလို႕ and it should get tokenized to ျခင္း and င္းျခင္း but it just outputs the input. Is there something I am doing wrong?

    #include
    #include
    #include

    int main(){
        using namespace std;
        using namespace boost;
        string s = "ျခင္းခတ္ခဲ့တာလို႕";
        tokenizer<> tok(s);
        for(tokenizer<>::iterator beg=tok.begin(); beg!=tok.end();++beg){
                cout << *beg << "
";
        }
    }

The output should break into a series of tokens like: ျခင္း and ခတ္ခဲ့တာလို႕ but currently, the output is equal to input.

I want to tokenize this into a series of tokens with word boundaries if possible.

Tokenizing using boost gives out the same output

Answers (1)

DEMO

Related Questions