Dhinakar
Dhinakar

Reputation: 4151

how to get character length of the unicode along with space in java

I need to find the length of my string "பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்.பி. நேற்று தேர்தல் ஆணையர் வி.சம்பத்". I got the string length as 45 but i expect the string length to be 59. Here i need to add the regular expression condition for spaces and dot (.). My code

import java.util.*;
import java.lang.*;
import java.util.regex.*;

class UnicodeLength
{
public static void main (String[] args)
{
    String s="பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்பி நேற்று தேர்தல் ஆணையர் விசம்பத்";
    List<String> characters=new ArrayList<String>();
    Pattern pat = Pattern.compile("\\p{L}\\p{M}*");
    Matcher matcher = pat.matcher(s);
    while (matcher.find()) {
        characters.add(matcher.group());            
    }

    // Test if we have the right characters and length
    System.out.println(characters);
    System.out.println("String length: " + characters.size());

}

}

Upvotes: 1

Views: 168

Answers (1)

Erik Gillespie
Erik Gillespie

Reputation: 3959

The code below worked for me. There were three issues that I fixed:

  1. I added a check for spaces to your regular expression.
  2. I added a check for punctuation to your regular expression.
  3. I pasted the string from your comment into the string in your code. They weren't the same!

Here's the code:

public static void main(String[] args) {
    String s = "பாரதீய ஜனதா இளைஞர் அணி தலைவர் அனுராக்சிங் தாகூர் எம்.பி. நேற்று தேர்தல் ஆணையர் வி.சம்பத்";
    List<String> characters = new ArrayList<String>();
    Pattern pat = Pattern.compile("\\p{P}|\\p{L}\\p{M}*| ");
    Matcher matcher = pat.matcher(s);
    while (matcher.find()) {
        characters.add(matcher.group());
    }
    // Test if we have the right characters and length
    int i = 1;
    for (String character : characters) {
        System.out.println(String.format("%d = [%s]", i++, character));
    }
    System.out.println("Characters Size: " + characters.size());
}

It's probably worth pointing out that your code is remarkably similar to the solution for this SO. One comment on that solution in particular led me to discover the missing check for punctuation in your code and allowed me to notice that the string from your comment didn't match the string in your code.

Upvotes: 1

Related Questions