Reputation: 410
I am trying to capture only proper words from a string line using regex (i.e. points, commas, parenthesis, etc... are not desired). For example, if the input line is:
So she was considering in her own mind (as well as she could),
I would like to capture:
So
she
was
considering
in
....
Does anybody knows a way to do this? I'm new to regex unfortunately :S
Cheers!
Upvotes: 1
Views: 74
Reputation: 41848
This is the regex you need:
\b[a-zA-Z]+\b
Explanation
\b
is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)[a-zA-Z]
matches one character in the ranges a-z
and A-Z
+
quantifier says that we must match what precedes one or more times\b
boundary ensures that our word is finished. Together the two boundaries ensure we have a complete word.In Java
In the comments, you mentioned that you'd like to see a list. You can use this:
List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\b[a-z]+\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
matchList.add(regexMatcher.group());
}
Note that I made the pattern case-insensitive.
Upvotes: 2