Reputation: 58
How to split by ASCII Character group in REGEX (Android/Java)
Actual String
"আমি আছি i am ইংরেজি থেকে বাংলা"
Expected Output
আমি আছি
i am
ইংরেজি থেকে বাংলা
Upvotes: 2
Views: 140
Reputation: 3446
You could always split on the following:
(?<=[\u0021-\u007E])\s+(?=[^\u0021-\u007E])|(?<=[^\u0021-\u007E])\s+(?=[\u0021-\u007E])
This splits on whitespace preceded by a standard latin character and followed by not a standard latin character or not a standard latin character followed by a standard latin character. Of course you can modify the unicode ranges to accept by looking here as a reference.
Upvotes: 2