Mohamed Benmahdjoub
Mohamed Benmahdjoub

Reputation: 1290

My regular expression isn't matching a thing in my java program

I'm Trying to match one of the following 2 examples:

Example input 1:

<a href="Substance/acide_flavodique-4454.htm">acide flavodique</a>

Example input 2:

<a href="Medicament/ciprofloxacine_arrow_750_mg_cp_pellic-71371.htm">CIPROFLOXACINE ARROW 750 mg cp pellic</a>

and what i need to print in my file is : 1- acide flavodique :if it matches the first example. 2- ciprofloxacine :if it matches the 2nd example. Is there any problem with my regular expression or something else ? Thanks in advance!

BufferedReader lire = new BufferedReader(new FileReader(file1));
            do{         
                String line = lire.readLine();



                if(line == null)
                {
                    break;
                }
                Pattern p = Pattern.compile ("<a href=\"Substance/.+>(.+)</a>|<a href=\"Medicament/.+>(.+)\\s+.+</a>");
                Matcher m = p.matcher(line); System.out.println("match:"+m.group(1)+"\n");
                if (m.matches()) {
                writer.write(line);
                writer.write(System.getProperty("line.separator"));
                }
            }while(true);


            //      }
            writer.close();
        }}}

Upvotes: 1

Views: 67

Answers (4)

user557597
user557597

Reputation:

Unless you care which one is found, I guess you could combine the two so all that's
needed is a single capture group 1.

 #  "<a\\s+href\\s*=\\s*\"\\s*(?:Substance|Medicament)/[^>]+>([\\s\\S]+?)</a>"

 <a \s+ href \s* = \s* " \s* 
 (?: Substance | Medicament )
 / [^>]+ 
 >
 ( [\s\S]+? )                  # (1)
 </a>

Upvotes: 0

janos
janos

Reputation: 124724

The first problem:

   Matcher m = p.matcher(line); 
   System.out.println("match:"+m.group(1)+"\n");
   if (m.matches()) {
       // ...
   }

This doesn't work, because you must call m.matches() first before calling m.group(1). So this will be better:

   Matcher m = p.matcher(line); 
   if (m.matches()) {
       System.out.println("match:"+m.group(1)+"\n");
       // ...
   }

The second problem is with the groups. Given this pattern:

Pattern p = Pattern.compile("<a href=\"Substance/.+>(.+)</a>|<a href=\"Medicament/.+>(.+)\\s+.+</a>");

And these inputs:

String line1 = "<a href=\"Substance/acide_flavodique-4454.htm\">acide flavodique</a>";
String line2 = "<a href=\"Medicament/ciprofloxacine_arrow_750_mg_cp_pellic-71371.htm\">CIPROFLOXACINE ARROW 750 mg cp pellic</a>";

Both of these lines will match, but the matched part will be in different groups. For line1, "acide flavodique" will be in .group(1), but for line2, "CIPROFLOXACINE ARROW 750 mg cp" will be in .group(2). This is because in your regular expression you have two (...).

Upvotes: 1

Eyal Schneider
Eyal Schneider

Reputation: 22446

You are calling m.group(..) too early. You should first call m.matches(), otherwise you get IllegalStateException.

And by the way, the pattern is found (at least the two examples you provide are matched).

Upvotes: 0

John Cipponeri
John Cipponeri

Reputation: 892

Your pattern being:

<a href=\"Substance/.+>(.+)</a>|<a href=\"Medicament/.+>(.+)\\s+.+</a>

This contains a few '/' characters which are considered unescaped delimeters rendering your pattern useless. You can test for such things here: https://www.regex101.com/

Upvotes: 2

Related Questions