Brad
Brad

Reputation: 4547

How to grab all words that start with capital letters?

I want to create a Java regular expression to grab all words that start with a capital letter then capital or small letters, but those letters may contain accents.

Examples :

Where

Àdónde

Rápido

Àste

Can you please help me with that ?

Upvotes: 4

Views: 4749

Answers (3)

Valeriy K.
Valeriy K.

Reputation: 2904

You can do it without regular expression. Verify the first letter in each word by transforming it to lower case and then check equality:

        String firstLetter = String.valueOf(seq[i].charAt(0));
        String lowerCase = firstLetter.toLowerCase();
        if (!firstLetter.equals(lowerCase))
            System.out.println(seq[i]);
   

It will work with any accent.

Upvotes: 0

agiles
agiles

Reputation: 1721

Code for to detect the Capital Letters in a given para. in this case input given as Console Input.

import java.io.*;
import java.util.regex.*;
import java.util.Scanner;

public class problem9 {

    public static void main(String[] args) {
    String line1;
    Scanner in = new Scanner(System.in);
    String pattern = "(?U)\\b\\p{Lu}\\p{L}*\\b";

    line1 = in.nextLine();
    String delimiter = "\\s";   
    String[] words1 = line1.split(delimiter);

    for(int i=0; i<words1.length;i++){
        if(words1[i].matches(pattern)){
        System.out.println(words1[i]);
        }    
    }

  }
 }

If you give the Input something like

Input:This is my First Program

output:

This

First

Program

Upvotes: 1

Tim Pietzcker
Tim Pietzcker

Reputation: 336378

Regex:

\b\p{Lu}\p{L}*\b

Java string:

"(?U)\\b\\p{Lu}\\p{L}*\\b"

Explanation:

\b      # Match at a word boundary (start of word)
\p{Lu}  # Match an uppercase letter
\p{L}*  # Match any number of letters (any case)
\b      # Match at a word boundary (end of word)

Caveat: This only works correctly in very recent Java versions (JDK7); for others you may need to substitute a longer sub-regex for \b. As you can see here, you may need to use (kudos to @tchrist)

(?:(?<=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])|(?<![\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]])(?=[\pL\pM\p{Nd}\p{Nl}\p{Pc}[\p{InEnclosedAlphanumerics}&&\p{So}]]))

for \b, so the Java string would look like this:

"(?:(?<=[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])(?![\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])|(?<![\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])(?=[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\]))\\p{Lu}\\p{L}*(?:(?<=[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])(?![\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])|(?<![\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\])(?=[\\pL\\pM\\p{Nd}\\p{Nl}\\p{Pc}\\[\\p{InEnclosedAlphanumerics}&&\\p{So}]\\]))"

Upvotes: 8

Related Questions