Reputation: 485
I am struggling a bit to create a regex matching pattern to be used with matches() method of String. My String value is something like -
3012145A_20348409-146139460.ABCDxyzPQr.1.1.xml
I am using the String.matches("regex") method but to be honest struggling to create the pattern which will match the String values like these. I did try a few different combos but in vain so for. Searched on the internet for some examples. The values are always going to be in similar format though the length might vary.
Any help is much appreciated.
There is more to matching just .xml
Well, apart from the example given there will be other values too in the List, so I need to match like
3012145A_20348409-146139460.ABCDxyzPQr.1.1.xml
The list of values could be like -
3012145A_20348409-146139460.ABCDxyzPQr.1.1.xml
3012145_Error.xml
3012145_UK.pdf
3012145A_20348409.ABC.10.10.10.xml
I need the first value among these
(alphanum)(underscore)(num)(hyphen)(num)(dot)(aLpHa)(dot)(num)(dot)(num)(dot)(.xml)
I tried this -
s.matches("[a-zA-Z0-9]_[0-9]-[0-9].[a-zA-Z].[0-9].[0-9].xml");
Upvotes: 0
Views: 785
Reputation: 485
Brilliant!. Thanks a lot Favonius.
That worked perfectly.
So as I understand that what I was doing is even though I was giving a range [0-9a-zA-Z]
it was actually trying to match only the first char, in my example, 3
.
So in reality rather than 3012145A
it was checking only whether 3
is part of my given range([0-9a-zA-Z]
) and so forth for the entire String.
Your solution \w*
will check whether that particular section is alphanumeric or \d*
will check whether the section(bounded by the boundaries, say .
or _
) is within the whole range of numbers and/or alphabets.
So a very murkier way of matching 3012145A_
could be
[0-9][0-9][0-9][0-9][0-9][0-9][0-9][a-zA-Z]_
I am not proposing this solution just trying to understand the behavior and difference between [0-9]
and \d*
.
I still have a question though, the significance of (\\.)?\\.
, whats the purpose of this.
Thanks a lot again
Upvotes: 0
Reputation: 13974
Requirement :
(alphanum)(underscore)(num)(hyphen)(num)(dot)(aLpHa)(dot)(num)(dot)(num)(dot)(.xml)
Supposed regex:
\w*_\d*-\d*\.([a-zA-Z])*\.\d*\.\d*(\.)?\.xml
In java this will translate to:
Pattern p = Pattern.compile("\\w*_\\d*-\\d*\\.([a-zA-Z])*\\.\\d*\\.\\d*(\\.)?\\.xml",Pattern.CASE_INSENSITIVE);
Note
As I am using [a-zA-Z]
, you might not need Pattern.CASE_INSENSITIVE
Problem with your regex: s.matches("[a-zA-Z0-9]_[0-9]-[0-9].[a-zA-Z].[0-9].[0-9].xml");
You are looking for a single instance of either alpha
, number
or alphanumeric
. Use *
or +
metacharacters.
Hope this help.
Upvotes: 3