AJS
AJS

Reputation: 11

Java - Search for words having more than 1 capital letter

Just need your help regarding a task to search in Java. I need to read a line from a file and make a list of all the words that have more than 1 capital letter in them.

For example if the line is : There are SeVen Planets In this UniverSe

The result should be : SeVen and UniverSe

I am able to read the line by splitting it into words but some how not able to use the correct regular expression to search for these words.

The following is a small example I used but it returns false although I think it should return true.

System.out.println("ThiS".matches("[A-Z]{2,}"));

Can anyone please have a look at this and suggest ways to achieve my result? Appreciate any help.

Thanks,

AJ

Upvotes: 1

Views: 4185

Answers (8)

Om Sao
Om Sao

Reputation: 7663

You can use this regex:

"SeVen".matches("[A-Z].[A-Z][a-zA-Z]") //true

"SeveNEight".matches("[A-Z].[A-Z][a-zA-Z]") //true

"seVeneight".matches("[A-Z].[A-Z][a-zA-Z]") //false

Upvotes: 0

Roland
Roland

Reputation: 1

i use this regex /[A-Z].[A-Z]+/

Upvotes: 0

mikej
mikej

Reputation: 66293

[A-Z]{2,} means 2 or more consecutive upper case letters. You could use [A-Z].*[A-Z] which would allow for any other characters to appear between the two uppercase letters.

Alternatively, you don't really need to use regex for this. If you prefer you could just iterate over each character in the string and use Character.isUpperCase and count the number of matching characters.

Upvotes: 7

Mark Peters
Mark Peters

Reputation: 81104

    Pattern pat = Pattern.compile("\\w*[A-Z]\\w*[A-Z]\\w*");
    Matcher matcher = pat.matcher("There are SeVen Planets In this UniverSe");
    while ( matcher.find() ) {
        System.out.println(matcher.group());
    }

Prints

SeVen
UniverSe

I'm horrible with regex though so there's probably a simpler way. This way's really easy to understand though: start at the beginning of a word, match 0 or more characters, then an upper-case character, then 0 or more characters, then another upper-case character, then 0 or more characters.

Upvotes: 0

Tim Pietzcker
Tim Pietzcker

Reputation: 336368

\b(?:[a-z]*[A-Z]){2}[a-z]*\b

will match words that contain at least two uppercase letters.

If you want to allow words that contain other letters than ASCII, use

\b(?:\p{Ll}*\p{Lu}){2}\p{Ll}*\b

Of course, in a Java string, you need to escape (double) the backslashes.

So you get:

Pattern regex = Pattern.compile("\\b(?:\\p{Ll}*\\p{Lu}){2}\\p{Ll}*\\b");
Matcher regexMatcher = regex.matcher(subjectString);
while (regexMatcher.find()) {
    // matched text: regexMatcher.group()
    // match start: regexMatcher.start()
    // match end: regexMatcher.end()
}

Upvotes: 2

MikeD
MikeD

Reputation: 3368

Your current regular expression matches only a sequence of two or more upper case letters, not multiples spread throughout the word. So, you would match THis and tHIS but not ThiS as you have discovered.

You need to look for an upper case letter, maybe some lower case, and then another upper. Or in regex: [A-Z]\w*?[A-Z]

If you want to search the whole string without needing to split it first, then include the possibility of other word characters on either end and let the expression capture: (\w*?[A-Z]\w*?[A-Z]\w*)

Also note that we are using reluctant quantifiers so that they stop matching at the earliest opportunity in the first two instances, and the normal (greedy) quantifier at the end to pick up the rest of the word. Read more about the various quantifiers here.

Upvotes: 0

Jack
Jack

Reputation: 133609

Maybe [a-z]*[A-Z][a-z]*[A-Z][a-z]* can work.. the fact is that counting with {..} doesn't allow chars between the two letters.

Upvotes: 2

Uri
Uri

Reputation: 89799

The regular expression you listed is not going to work because it will search for a contiguous sequence of 2 or more upper case letters.

I think what you need to do is to write an expression that lets you allow lowercase letters on both sides.

I don't remember the exact syntax (I'm going to check) but something like .*[A-Z].*[A-Z].* will ensure that you have two upper cases

Upvotes: 1

Related Questions