Reputation: 95948
I have fixed format for a String
that will be always: SPXXX-SPYYY.zip
I need to extract XXX and YYY from the String
, but if for example XXX is 003 then I want to have 3 and not 003. (The same for YYY).
I've wrote these two codes:
1.
String st = "SP003-SP012.zip";
String[] splitted = st.split("\\.");
splitted = splitted[0].split("-");
splitted = splitted[0].split("P");
Integer in = new Integer(splitted[1]);
System.out.println(in); //Will print 3
//The same for the other part of the String
2.
Pattern pattern = Pattern.compile("^[a-zA-Z]+([0-9]+).*");
Matcher matcher = pattern.matcher(st);
int num = 0;
while (matcher.find()) {
num = Integer.parseInt(matcher.group(1));
System.out.println(num);
}
Upvotes: 0
Views: 1381
Reputation: 213193
Why the second code returns only the first number? (XXX) and misses the second?
If you look at your pattern - "^[a-zA-Z]+([0-9]+).*"
, it has an anchor caret - ^
at the beginning. That means, your pattern will only be searched at the beginning of the string. And that is why you got only first number corresponding to SPXXX
which is found at the beginning of the string "SPXXX-SPYYY"
, and not for the pattern SPYYY
, since it is not at the beginning, and hence won't be matched.
You can remove the caret (^)
, and you don't want that .*
at the end, since you are using Matcher#find()
method.
Pattern pattern = Pattern.compile("[a-zA-Z]+([0-9]+)");
But, given that your string will always be in the same format, you can even use a simpler pattern:
Pattern pattern = Pattern.compile("\\d+");
and get the group 1 from the matcher.
What code is better for this purpose?
I would go with the 2nd approach. Splitting string may not work always, and will become complicated as the string grows. You should only use split
when you actually want to split your string on some delimiter. In this case, you don't want to split the string, rather you want to extract a particular pattern. And the 2nd approach is the way to go.
Upvotes: 1
Reputation: 755
Use the following:
Pattern pattern = Pattern.compile("^[a-zA-Z]+0*(\\d+)-[a-zA-Z]+0*(\\d+).*");
Matcher matcher = pattern.matcher(st);
if (matcher.matches()) {
int num1 = Integer.parseInt(matcher.group(1));
int num2 = Integer.parseInt(matcher.group(2));
System.out.println(num1+" - "+num2);
}
Upvotes: 1
Reputation:
Define the pattern like this Pattern.compile("[a-zA-Z]+([0-9]+)");
For the example string the matcher
matches SPXXX
and SPYYY
for the two iterations of the loop.
And group(1)
returns XXX
and YYY
for the two cases respectively.
Upvotes: 1
Reputation: 1074028
Why the second code returns only the first number? (XXX) and misses the second?
Because your regular expression only defines that it's expecting to see one series of digits, and has only one capture group to capture them. The regular expression expects to see letters followed by digits, and only finds one thing that matches that. (Once the first bit is consumed, there are no letters left, so nothing matches your [a-zA-Z]+
.) Rather than trying to run the matcher repeatedly, I'd probably define a single regular expression that matched both bits:
Pattern pattern = Pattern.compile("^[a-zA-Z]+([0-9]+)-([0-9]+).*");
...and use the resulting two capture groups. (Also note you can use \d
to match a digit:
Pattern pattern = Pattern.compile("^[a-zA-Z]+(\\d+)-(\\d+).*");
...but that's a side note.)
Is using a regex for this purpose is better than the first code I suggested?
That's up to you, it's a judgement call. For this specific case, if the format really is invariant, I'd go with Aleks G's approach.
Upvotes: 1
Reputation: 57306
If it's always the same format, then why not just use substring
?
String str = "SP003-SP456.zip";
int xxx = Integer.parseInt(str.substring(2, 5));
int yyy = Integer.parseInt(str.substring(8, 11));
Or, if those XXX and YYY are not necessarily numbers, then just add try-catch
:
String str = "SP003-SP456.zip";
int xxx, yyy;
try {
int xxx = Integer.parseInt(str.substring(2, 5));
}
catch(NumberFormatException e) {
xxx = 0;
}
try {
int yyy = Integer.parseInt(str.substring(8, 11));
}
catch(NumberFormatException e) {
yyy = 0;
}
Upvotes: 4