Reputation: 66
Here is something that I don't really understand.
I would like to get the date part from the following string:
<th>Elkezdodott</th>
<td>2016. december 20., 19:29</td>
So I use the following code:
System.out.println(html);
Pattern p = Pattern.compile("\\p{Punct}th\\p{Punct}Elkezdodott\\p{Punct}{2}th\\p{Punct}\\p{Space}*" +
"\\p{Punct}td\\p{Punct}" +
"(\\d{4}\\p{Punct}\\p{Space}*[a-zA-Z]*\\p{Space}*\\d*\\p{Punct}{2}" +
"\\p{Space}*\\d{2}\\p{Punct}\\d{2})\\p{Punct}{2}td\\p{Punct}");
Matcher m = p.matcher(html);
if(m.matches()){
System.out.println("matches");
System.out.println(m.group());
}
This regex seems correct according to the Check RegExp option of the Android Studio:
The result of the System.out.println(html) is exactly the same as you can see on the image:
06-03 11:49:15.779 4581-5229/hu.lyra.moly_kihivasok I/System.out: <th>Elkezdodott</th>
06-03 11:49:15.779 4581-5229/hu.lyra.moly_kihivasok I/System.out: <td>2016. december 20., 19:29</td>
What I really don't understand is why m.matches() returns false. I also tried m.find(), but I got the same result. Did I miss something?
Thanks for any advice.
Upvotes: 0
Views: 749
Reputation: 16089
I've executed your exact example and it matches the string. The only thing you did wrong, is not passing an argument to the group()
function. You need to define which group you want to match. In your case, this would be the first one. So, use group(1);
.
Btw. why are you using such a complicated pattern to match your string? I would not use \p{}
that often, because it makes it unreadable. Just use this:
"<th>Elkezdodott</th>\\n<td>(\\d{4}\\.\\s*[a-zA-Z]+\\s*\\d{1,2}\\.,\\s*\\d{2}:\\d{2})</td>"
Btw.^2 You shouldn't use regex to parse HTML. Use an HTML parser instead. There are plenty around. If you try to parse HTML with regex you are soon coming to major problems (nesting, wrong HTML, like missing end tags etc.).
Upvotes: 1