Reputation: 1290
I'm Trying to match one of the following 2 examples:
Example input 1:
<a href="Substance/acide_flavodique-4454.htm">acide flavodique</a>
Example input 2:
<a href="Medicament/ciprofloxacine_arrow_750_mg_cp_pellic-71371.htm">CIPROFLOXACINE ARROW 750 mg cp pellic</a>
and what i need to print in my file is : 1- acide flavodique :if it matches the first example. 2- ciprofloxacine :if it matches the 2nd example. Is there any problem with my regular expression or something else ? Thanks in advance!
BufferedReader lire = new BufferedReader(new FileReader(file1));
do{
String line = lire.readLine();
if(line == null)
{
break;
}
Pattern p = Pattern.compile ("<a href=\"Substance/.+>(.+)</a>|<a href=\"Medicament/.+>(.+)\\s+.+</a>");
Matcher m = p.matcher(line); System.out.println("match:"+m.group(1)+"\n");
if (m.matches()) {
writer.write(line);
writer.write(System.getProperty("line.separator"));
}
}while(true);
// }
writer.close();
}}}
Upvotes: 1
Views: 67
Reputation:
Unless you care which one is found, I guess you could combine the two so all that's
needed is a single capture group 1.
# "<a\\s+href\\s*=\\s*\"\\s*(?:Substance|Medicament)/[^>]+>([\\s\\S]+?)</a>"
<a \s+ href \s* = \s* " \s*
(?: Substance | Medicament )
/ [^>]+
>
( [\s\S]+? ) # (1)
</a>
Upvotes: 0
Reputation: 124724
The first problem:
Matcher m = p.matcher(line); System.out.println("match:"+m.group(1)+"\n"); if (m.matches()) { // ... }
This doesn't work, because you must call m.matches()
first before calling m.group(1)
.
So this will be better:
Matcher m = p.matcher(line);
if (m.matches()) {
System.out.println("match:"+m.group(1)+"\n");
// ...
}
The second problem is with the groups. Given this pattern:
Pattern p = Pattern.compile("<a href=\"Substance/.+>(.+)</a>|<a href=\"Medicament/.+>(.+)\\s+.+</a>");
And these inputs:
String line1 = "<a href=\"Substance/acide_flavodique-4454.htm\">acide flavodique</a>";
String line2 = "<a href=\"Medicament/ciprofloxacine_arrow_750_mg_cp_pellic-71371.htm\">CIPROFLOXACINE ARROW 750 mg cp pellic</a>";
Both of these lines will match, but the matched part will be in different groups.
For line1
, "acide flavodique" will be in .group(1)
,
but for line2
, "CIPROFLOXACINE ARROW 750 mg cp" will be in .group(2)
.
This is because in your regular expression you have two (...)
.
Upvotes: 1
Reputation: 22446
You are calling m.group(..) too early. You should first call m.matches(), otherwise you get IllegalStateException.
And by the way, the pattern is found (at least the two examples you provide are matched).
Upvotes: 0
Reputation: 892
Your pattern being:
<a href=\"Substance/.+>(.+)</a>|<a href=\"Medicament/.+>(.+)\\s+.+</a>
This contains a few '/' characters which are considered unescaped delimeters rendering your pattern useless. You can test for such things here: https://www.regex101.com/
Upvotes: 2