reigeki
reigeki

Reputation: 391

Java How to Parse Smile notations from String

I want to parse a string that includes emotional notations like ":)",":p","!","?" also words. For example,like this string "How dare you! You have lost him two days ago:'(" ,I want to get result like that :

How
dare
you
!
You
have
lost
him
two
days
ago
:'(

I use StringTokenizer to parse the sentences with a separator, but I have lost emotional notations. Thanks

The code that I use :

public class FullParser {
    private String sentence;
    private String separator="' ,.:!()@/<>";

    private ArrayList<String> mywords;

    public FullParser(String sentence){
        this.sentence=sentence;
        mywords=new ArrayList<String>();
        separator+='"';
    }
    public void parsing(){
        StringTokenizer st = new StringTokenizer( sentence, separator, true );

        while ( st.hasMoreTokens() ) {
            String token = st.nextToken();
            if (!( token.length() == 1 && separator.indexOf( token.charAt( 0 ) ) >= 0 )) {
                //Log.i("PARSER",token);                
                mywords.add(token);
            }
        }
    }
    public ArrayList<String> getmyWords(){
        return mywords;
    }

Upvotes: 1

Views: 1347

Answers (3)

Bernhard Barker
Bernhard Barker

Reputation: 55619

I'm not sure whether this will answer your question, but, just to show off the power of regular expressions, here's a one-line solution: (reasonably tested)

sentence.split(" |(?<! |\\p{Punct})(?=\\p{Punct})|(?<=\\p{Punct})(?!\\p{Punct})");

\\p{Punct} is for any single punctuation character, or, if you want to be more specific, you can also use [',\\.:!()@/<>], which means any of these characters: ',\\.:!()@/<>.
(?<!...) means negative look-behind, meaning the previous characters doesn't match this.
(?=...) means positive look-ahead, meaning the next characters match this.
(?<=...) means positive look-behind, meaning the previous characters match this.
(?!...) means negative look-ahead, meaning the next characters doesn't match this.
The space is an actual space.
| means "OR", as in what appears to the left OR what appears to the right up until the nearest enclosing bracket.

Why it works requires a fair bit of thought.

I had to complicate it a little more than I would've liked because there were some cases which didn't work.

Test.

See this for more information on Java regular expressions.

Upvotes: 1

Kshitij
Kshitij

Reputation: 8634

What you can do is store all emotion notations in an array. You will need to escape special characters to avoid regex while doing replaceAll. After this, loop through all the emotions and add SPACE before emotion whereever it exists in sentence.

This will help you to split sentence by SPACE later. Also, you can remove any double SPACE which might have introduced.

See code below-(not tested, might have grammatical errors)

private static final String SPACE =" ";
String[]  emotionList = new String[]{':P',':)','!',....};//you might need to handle :) as :\\) to escape regex 

public void parsing(){
   for(String s:emotionList){ //add space before each emotion.
      sentence=sentence.replaceAll(s,SPACE+s);
   }

   sentence=sentence.replaceAll(SPACE+SPACE, SPACE);//optional - replace double SPACE by single SPACE.
   mywords = Arrays.asList(sentence.split(SPACE));
}

Upvotes: 0

Satheesh Cheveri
Satheesh Cheveri

Reputation: 3679

Ideally I would suggest to go for regular expression, but you would need to apply complex regex pattern if you really want to use as many similes/expressions. (100+ smiles you can find in and out every day usage).

May be you can store possible expressions/smilies in a ArrayList as String , and then do search with arrayList elements on the given string to find expression and then append with new line. As an eg:

          //initialisation - can be done once on startup/value can be fetch from db
        ArrayList<String> list = new ArrayList<String>() ;
    list.add(":)");
    list.add("!");
    list.add("?");

    // When ever you want to parse the String
    String input=" Hello :) How are you ? I am :) not fine! ha ha!";
    System.out.println(input);
    for(String exp:list){
        input = input.replace(exp, "\n"+exp+"\n");
    }
    System.out.println(input);

Upvotes: 0

Related Questions