Maroun
Maroun

Reputation: 95948

Extracting numbers from a fixed format String

I have fixed format for a String that will be always: SPXXX-SPYYY.zip

I need to extract XXX and YYY from the String, but if for example XXX is 003 then I want to have 3 and not 003. (The same for YYY).

I've wrote these two codes:

1.

String st = "SP003-SP012.zip";
String[] splitted = st.split("\\.");
splitted = splitted[0].split("-");  
splitted = splitted[0].split("P");
Integer in = new Integer(splitted[1]);
System.out.println(in); //Will print 3
//The same for the other part of the String

2.

Pattern pattern = Pattern.compile("^[a-zA-Z]+([0-9]+).*");
Matcher matcher = pattern.matcher(st);
int num = 0;
while (matcher.find()) {
   num = Integer.parseInt(matcher.group(1));
   System.out.println(num);
} 

Upvotes: 0

Views: 1381

Answers (5)

Rohit Jain
Rohit Jain

Reputation: 213193

Why the second code returns only the first number? (XXX) and misses the second?

If you look at your pattern - "^[a-zA-Z]+([0-9]+).*", it has an anchor caret - ^ at the beginning. That means, your pattern will only be searched at the beginning of the string. And that is why you got only first number corresponding to SPXXX which is found at the beginning of the string "SPXXX-SPYYY", and not for the pattern SPYYY, since it is not at the beginning, and hence won't be matched.

You can remove the caret (^), and you don't want that .* at the end, since you are using Matcher#find() method.

Pattern pattern = Pattern.compile("[a-zA-Z]+([0-9]+)");

But, given that your string will always be in the same format, you can even use a simpler pattern:

Pattern pattern = Pattern.compile("\\d+");

and get the group 1 from the matcher.

What code is better for this purpose?

I would go with the 2nd approach. Splitting string may not work always, and will become complicated as the string grows. You should only use split when you actually want to split your string on some delimiter. In this case, you don't want to split the string, rather you want to extract a particular pattern. And the 2nd approach is the way to go.

Upvotes: 1

Alberto Segura
Alberto Segura

Reputation: 755

Use the following:

Pattern pattern = Pattern.compile("^[a-zA-Z]+0*(\\d+)-[a-zA-Z]+0*(\\d+).*");
Matcher matcher = pattern.matcher(st);
if (matcher.matches()) {
   int num1 = Integer.parseInt(matcher.group(1));
   int num2 = Integer.parseInt(matcher.group(2));
   System.out.println(num1+" - "+num2);
} 

Upvotes: 1

user2030471
user2030471

Reputation:

Define the pattern like this Pattern.compile("[a-zA-Z]+([0-9]+)");

For the example string the matcher matches SPXXX and SPYYY for the two iterations of the loop.

And group(1) returns XXX and YYY for the two cases respectively.

Upvotes: 1

T.J. Crowder
T.J. Crowder

Reputation: 1074028

Why the second code returns only the first number? (XXX) and misses the second?

Because your regular expression only defines that it's expecting to see one series of digits, and has only one capture group to capture them. The regular expression expects to see letters followed by digits, and only finds one thing that matches that. (Once the first bit is consumed, there are no letters left, so nothing matches your [a-zA-Z]+.) Rather than trying to run the matcher repeatedly, I'd probably define a single regular expression that matched both bits:

Pattern pattern = Pattern.compile("^[a-zA-Z]+([0-9]+)-([0-9]+).*");

...and use the resulting two capture groups. (Also note you can use \d to match a digit:

Pattern pattern = Pattern.compile("^[a-zA-Z]+(\\d+)-(\\d+).*");

...but that's a side note.)

Is using a regex for this purpose is better than the first code I suggested?

That's up to you, it's a judgement call. For this specific case, if the format really is invariant, I'd go with Aleks G's approach.

Upvotes: 1

Aleks G
Aleks G

Reputation: 57306

If it's always the same format, then why not just use substring?

String str = "SP003-SP456.zip";
int xxx = Integer.parseInt(str.substring(2, 5));
int yyy = Integer.parseInt(str.substring(8, 11));

Or, if those XXX and YYY are not necessarily numbers, then just add try-catch:

String str = "SP003-SP456.zip";
int xxx, yyy;

try {
    int xxx = Integer.parseInt(str.substring(2, 5));
}
catch(NumberFormatException e) {
   xxx = 0;
}

try {
    int yyy = Integer.parseInt(str.substring(8, 11));
}
catch(NumberFormatException e) {
   yyy = 0;
}

Upvotes: 4

Related Questions