helpdesk
helpdesk

Reputation: 2074

word extraction and splitting using Java regex

I have a string "'GLO', FLO" Now, I want a regex expression that will check each words in the string and if: -word begins and ends with a single quote, replace single quotes with spaces -if a comma is encounted between words split both words using space.

so, in the end, I should get GLO FLO.

Any help on how to do this using replaceAll() method on the string?

This regex didn't do it for me : "'([^' ]+)|\\s+'"

public static void displaySplitString(final String str) {
   String pattern1 = "^'?(\\w+)'?,\\s+(\\w+)$";
   StringTokenizer strTok = new StringTokenizer(str, " , ");
   while (strTok.hasMoreTokens()) {
     String delim = (strTok.nextToken());
     delim.replaceAll(pattern1, "$1$2");
     System.out.println(delim);
   }
 } //in main method displaySplitString("'GLO', FLO");

Upvotes: 2

Views: 93

Answers (1)

Wiktor Stribiżew
Wiktor Stribiżew

Reputation: 626690

Here is the snippet that should get you going:

public static void displaySplitString(String str)
    {
        String pattern1 = "^'?(\\w+)'?(?=\\S)";
        str = str.replaceAll(pattern1, " $1 ");
        StringTokenizer strTok = new StringTokenizer(str, " , "); 
        while (strTok.hasMoreTokens()) 
        {
            String delim = (strTok.nextToken()); 
            System.out.println(delim); 
        }
    }

Here,

  • I change str argument declaration as not final (so that we could change the str value inside the method)
  • I am using the first regex ^'?(\\w+)'?(?=\\S) to remove potential single quotes from around the first word
  • Since you use a StringTokenizer, just 2 lines inside the while block are enough.

The regex means:

  • ^ - Start looking for the match at the very start of the string
  • '? - match 0 or 1 single quote
  • (\\w+) - match and capture 1 or more alphanumeric symbols (we'll refer to them as $1 in the replacement pattern)
  • '? - match 0 or 1 single quote
  • (?=\\S) - match only if there is no space after the optional single quote. Perhaps, you can even replace this lookahead with a mere , if you always have it there, after the first word.

Upvotes: 3

Related Questions