Chaos
Chaos

Reputation: 11721

Java Regex to Match words + spaces

I am trying to construct this simple regex to match words + whitespace in Java, but I got confused trying to work it out. There are a lot of similar examples on this site, but the answers mostly give out the regex itself without explaining how it is constructed.

What I'm looking for is the Line of Thought behind forming the regular expression.

Sample Input String:

String Tweet = "\"Whole Lotta Love\" - Led Zeppelin";

which when printed is: "Whole Lotta Love" - Led Zeppelin

Problem Statement:

I want to find out if a String has a quotation in it. In the above sample string, Whole Lotta Love is the quotation.

What I've tried:

My first approach was to match anything between two double quotes, so I came up with the following regex:

"\"(\\w+\")" and "\"(^\")"

But this approach only works if there are no spaces between the two double quotes, like:

"Whole" Lotta Love

So I tried to modify my regex to match spaces, and this is where I got lost.

I tried the following, but they don't match

"\"(\\w+?\\s+\")" , "\"(\\w+)(\\s+)\"" , "\"(\\w+)?(\\s+)\""

I would appreciate if someone could help me figure out how to constuct this.

Upvotes: 1

Views: 18446

Answers (4)

Sandun Susantha
Sandun Susantha

Reputation: 1140

[\w\s]+

we can use this as we need to separate sentences. For example, if we need to grab sentence from "hi I am Sandun". Then we can use "+[\w\s]+".

Upvotes: 0

Mena
Mena

Reputation: 48404

The simplest way would be to have a while loop looking for anything in between two quotes in your input, so you check for multiple quoted expressions.

My example here accepts anything in between two quotes. You can refine with only alphabetics and spaces.

String quotedTweet = "\"Whole Lotta Love\" - Led Zeppelin";
String unquotedTweet = "Whole Lotta Love from Led Zeppelin";
String multipleQuotes = "\"Whole Lotta Love\" - \"Led\" Zeppelin";
// commented Pattern for only alphabetics or spaces
// Pattern pattern = Pattern.compile("\"([\\p{Alpha}\\p{Space}]+?)\"");
Pattern pattern = Pattern.compile("\"(.+?)\"");
Matcher matcher = pattern.matcher(quotedTweet);
while (matcher.find()) {
    // will find "Whole Lotta Love"
    System.out.println(matcher.group(1));
}
matcher = pattern.matcher(unquotedTweet);
while (matcher.find()) {
    // will find nothing
    System.out.println(matcher.group(1));
}
matcher = pattern.matcher(multipleQuotes);
while (matcher.find()) {
    // Will find "Whole Lotta Love" and "Led"
    System.out.println(matcher.group(1));
}

Edit this example and the commented variant will not prevent quoted whitespace, as in " ". Let me know if that's a requirement - the Pattern would be a bit more complicated in that case.

Output:

Whole Lotta Love
Whole Lotta Love
Led

Upvotes: 2

ddr
ddr

Reputation: 201

You almost had it. Your regexes would match alphanumeric characters followed by spaces, like this:

"Whole "

but not any alphanumeric chars after that. zEro is almost right, but you probably want to use a capture like this:

"\"([\\w\\s]+)\""

This matches one or more [whitespace/alphanumeric] chars. Note that alphanumeric includes _.

If you want to be more general, you could use

"\"([^\"]+)\""

which will match everything besides double quotes. For instance, "Who's on first?" (including the quotes) would be matched by the second regex but not by the first, since it includes punctuation.

Upvotes: 4

Casimir et Hippolyte
Casimir et Hippolyte

Reputation: 89557

You can use this:

\"(?>\\w+ *)+\"

or a character class as zEro suggests it.

Upvotes: 1

Related Questions