nelt22
nelt22

Reputation: 410

How to capture just words in a string line

I am trying to capture only proper words from a string line using regex (i.e. points, commas, parenthesis, etc... are not desired). For example, if the input line is:

So she was considering in her own mind (as well as she could),

I would like to capture:

So 
she 
was 
considering 
in
....

Does anybody knows a way to do this? I'm new to regex unfortunately :S

Cheers!

Upvotes: 1

Views: 74

Answers (1)

zx81
zx81

Reputation: 41848

This is the regex you need:

\b[a-zA-Z]+\b

See demo.

Explanation

  • \b is a word boundary that matches a position where one side is a letter, and the other side is not a letter (for instance a space character, or the beginning of the string)
  • The character class [a-zA-Z] matches one character in the ranges a-z and A-Z
  • The + quantifier says that we must match what precedes one or more times
  • The \b boundary ensures that our word is finished. Together the two boundaries ensure we have a complete word.

In Java

In the comments, you mentioned that you'd like to see a list. You can use this:

List<String> matchList = new ArrayList<String>();
Pattern regex = Pattern.compile("\\b[a-z]+\\b", Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE);
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    matchList.add(regexMatcher.group());
    } 

Note that I made the pattern case-insensitive.

Upvotes: 2

Related Questions