Regex pattern fails to match in python

Question

The following regex pattern fails to match as I would expect it would do.

The text fragment is:

txt = 'ΠΡΟΣ:  ΚΟΙΝ. :     Αθήνα,  16 - 10 - 2013  Αριθµ. Πρωτ. :  Κ2 – 6376 (δις)   ΤΟ ΕΘΝΙΚΟ ΤΥΠΟΓΡΑΦΕΙΟ  Καποδιστρίου 34  104 32 ΑΘΗΝΑ   Συνηµ. : ∆ιπλότ. Νο Η 3833967/2013      «ΚΥΠΡΟΥ ΑΕ∆ΑΚ»    Φειδιππίδου 26 & Χαλκηδόνος   11527  ΑΘΗΝΑ   ΑΝΑΚΟΙΝΩΣΗ  Καταχώρισης στο Γενικό Εµπορικό Μητρώο στοιχείων της ανώνυµης εταιρείας µε την επωνυµία  «ΚΥΠΡΟΥ ASSET MANAGEMENT ΑΝΩΝΥΜΗ ΕΤΑΙΡΕΙΑ ∆ΙΑΧΕΙΡΙΣΕΩΣ ΑΜΟΙΒΑΙΩΝ ΚΕΦΑΛΑΙΩΝ».  Ο ΥΦΥΠΟΥΡΓΟΣ ΑΝΑΠΤΥΞΗΣ  ΚΑΙ   ΑΝΤΑΓΩΝΙΣΤΙΚΟΤΗΤΑΣ  '

The regex pattern is:

import re
epwnymia_pattern = re.compile(r'επωνυμία\s+«([$?\w+\s*$?]+)»')
epwnymia = epwnymia_pattern.search(txt).group(1)  # Fails

I would expect to match the following phrase:

ΚΥΠΡΟΥ ASSET MANAGEMENT ΑΝΩΝΥΜΗ ΕΤΑΙΡΕΙΑ ∆ΙΑΧΕΙΡΙΣΕΩΣ ΑΜΟΙΒΑΙΩΝ ΚΕΦΑΛΑΙΩΝ

What is the reason the regex fails and how I should correct it?

Wiktor Stribiżew · Accepted Answer

It seems to me the easiest way to match that string is by using a negated character class [^«»] (that matches any char but « and ») and you need case insensitive matching:

(?i)επωνυμία\s+«([^«»]+)»

See the regex demo

Also, when you want to get the first match with re.search, it make sense to first check if there was a match at all before accessing the Group 1 value:

import re
txt = r'ΠΡΟΣ:  ΚΟΙΝ. :     Αθήνα,  16 - 10 - 2013  Αριθµ. Πρωτ. :  Κ2 – 6376 (δις)   ΤΟ ΕΘΝΙΚΟ ΤΥΠΟΓΡΑΦΕΙΟ  Καποδιστρίου 34  104 32 ΑΘΗΝΑ   Συνηµ. : ∆ιπλότ. Νο Η 3833967/2013      «ΚΥΠΡΟΥ ΑΕ∆ΑΚ»    Φειδιππίδου 26 & Χαλκηδόνος   11527  ΑΘΗΝΑ   ΑΝΑΚΟΙΝΩΣΗ  Καταχώρισης στο Γενικό Εµπορικό Μητρώο στοιχείων της ανώνυµης εταιρείας µε την επωνυµία  «ΚΥΠΡΟΥ ASSET MANAGEMENT ΑΝΩΝΥΜΗ ΕΤΑΙΡΕΙΑ ∆ΙΑΧΕΙΡΙΣΕΩΣ ΑΜΟΙΒΑΙΩΝ ΚΕΦΑΛΑΙΩΝ».  Ο ΥΦΥΠΟΥΡΓΟΣ ΑΝΑΠΤΥΞΗΣ  ΚΑΙ   ΑΝΤΑΓΩΝΙΣΤΙΚΟΤΗΤΑΣ  '
epwnymia_pattern = re.compile(r'(?i)επωνυμία\s+«([^«»]+)»')
epwnymia = epwnymia_pattern.search(txt)
if epwnymia:
    print(epwnymia.group(1))

See the Python demo online, output:

ΚΥΠΡΟΥ ASSET MANAGEMENT ΑΝΩΝΥΜΗ ΕΤΑΙΡΕΙΑ ∆ΙΑΧΕΙΡΙΣΕΩΣ ΑΜΟΙΒΑΙΩΝ ΚΕΦΑΛΑΙΩΝ

Regex pattern fails to match in python

Answers (1)

Related Questions