Manuelarte
Manuelarte

Reputation: 1810

How to find the first word of a sentence for only three different options

I can have these three kind of Strings

ALPHA_whatever_1234567

BETA_whateverDifferent_7654321

GAMMA_anotherOption_1237654

I want to extract from the Strings the beginning of them, whether is ALPHA, BETA or GAMMA.

So, for example, I would like to get:

ALPHA_whatever_1234567 -> ALPHA

BETA_whateverDifferent_7654321 -> BETA

GAMMA_anotherOption_1237654 -> GAMMA

I want to use Regular Expression, and I tried something like this

private static final Pattern PATTERN = Pattern.compile("(.*)_.*");

But it doesn't work for some Strings. I recover the beginning by

Matcher m = PATTERN.matcher(string);
m.find(1);

I also tried this Pattern:

private static final Pattern PATTERN = Pattern.compile("([ALPHA]|[BETA]|[GAMMA])_.*");

But it returns only the first character of the String.

What am I doing wrong?

Upvotes: 0

Views: 58

Answers (5)

shyam
shyam

Reputation: 1388

If you are not insistent on using regular expressions, you could give this a try:

String firstWord = myString.split("_")[0];

Where myString contains your String.

Upvotes: 1

August
August

Reputation: 12558

[...] in regex is a character class. A character class can only match a single character.

So [ALPHA] really means "match one of these characters: A, L, P, H, A"

If you remove the brackets, then it will match the entire word:

(ALPHA|BETA|GAMMA)_.*

Upvotes: 1

mlwn
mlwn

Reputation: 1187

String strr = "ALPHA_whatever_1234567";
String[] result = strr.split("_");
return result[0];

Upvotes: 0

M A
M A

Reputation: 72854

Just remove the brackets around the ALPHA, BETA and GAMMA since they represent character classes, i.e. [ALPHA] will match either of the letters A, L, P, H or A.

private static final Pattern PATTERN = Pattern.compile("(ALPHA|BETA|GAMMA)_.*");

Upvotes: 2

Sergey Kalinichenko
Sergey Kalinichenko

Reputation: 726589

Your regex does not work because dot . consumes too much, eating up the underscore. Here is how you can fix it:

private static final Pattern PATTERN = Pattern.compile("([^_]*)_.*");

Another alternative would be to use a "reluctant" qualifier for the asterisk, but that may lead to catastrophic backtracking.

Your other solution uses character classes [] incorrectly. The correct expression would have no square brackets, like this:

private static final Pattern PATTERN = Pattern.compile("(ALPHA|BETA|GAMMA)_.*");

Upvotes: 1

Related Questions