Fox
Fox

Reputation: 9444

Remove extra punctuation from string while keeping "smileys"?

I am running into some problems using the regular expression. Can you please help me out? The following in the problem I am trying to solve -

Input - :,... :(..:::))How are you today?..:(
Output - :( :) How are you today :(

Basically I want to remove the punctuations from the input string like-(.,:; etc) and replace them with empty string. But I want to keep the smilies -:) or :( .I have written the following code but it is not working.

String s = ":,... :(..:::))How are you today?..:( ";  
Pattern pattern = Pattern.compile("^(\\Q:)\\E|\\Q:(\\E)(\\p{P}+)");  
Matcher matcher = pattern.matcher(s);    
s = matcher.replaceAll("");

Thank You.

Upvotes: 2

Views: 703

Answers (3)

alain.janinm
alain.janinm

Reputation: 20065

You can try this :

    String s = ":,...:(..:::))How are you today?..:( ";  
    Pattern pattern = Pattern.compile("(:\\)|:\\(|[^\\p{Punct}]+|\\s+)");  
    Matcher matcher = pattern.matcher(s); 
    String res="";
    while(matcher.find()){
        res+=matcher.group(0);
    }
    System.out.println(res);

Result

:( :) How are you today :(

Upvotes: 1

Bart Kiers
Bart Kiers

Reputation: 170258

Try something like this:

[\p{P}&&[^:()]]|:(?![()])|(?<!:)[()]

A quick break down:

[\p{P}&&[^:()]]    # any punctuation mark except ':', '(' and ')'
|                  # OR
:(?![()])          # a ':' not followed by '(' or ')'
|                  # OR
(?<!:)[()]         # a '(' or ')' not preceded by ':'

Note that the [ ... && [^ ... ]] (set subtraction) is unique to Java's regex implementation. See: http://docs.oracle.com/javase/6/docs/api/java/util/regex/Pattern.html

Upvotes: 2

Brigand
Brigand

Reputation: 86260

I tested in JavaScript with this:

[.,:;](?![)(])

So that would translate to something like one of these in Java

{Punct}(?![)(])
\\p{P}(?![)(])

Upvotes: 1

Related Questions