d-b
d-b

Reputation: 971

Regular expressions, Greek characters and the *-quantifier doesn't work (but the +-quantifier does)?

I use this regular expression [\p{Greek}] to match any Greek character. It works as expected and matches the first Greek character on the line. However, I want to match all Greek characters that follows that first character but the *-quantifier doesn't seem to work for Greek characters.

This is my input data. First three spaces, a double quote and then a Greek or Latin string, with one or more space, ending with ", and a new line.

   "ξηλξλκξ λκλξλξ",
   "lkjlkj kjljl",
   "δδσασα ασδ ασδφ",
   "xxaax asdsd dsds",
   "δερεφε αδσφδσ",

a. ^.*?[\p{Greek}|\s] - just matches the first space on all lines.

b. ^.*?[\p{Greek}|\s]+ - matches all three initial spaces, on all lines.

c. ^.*?"[\p{Greek}|\s]+ - matches the whole line when it is written with Greek characters

d. ^.*?"[\p{Greek}|\s]* - matches the initial spaces and the " on the Latin lines and the whole line excluding the ", at the end on the Greek lines.

e. [\p{Greek}]* - matches all characters on the Latin lines, but just one at the time (in spite of the *). On the Greek lines it matches the initial spaces, one at the time, but not the first ". Then it matches the first word, not the space between the words,

(e) is super confusing. If I do a search-and-replace using that regular expression on the string "XYZ NOP", and insert A for everything found one at the time ("replace and find next") the result looks like this A A"XAYZA NAOPA",A. However, if I perform a "replace all", this is the result ´A A A A"AXAYAZA ANAOAPA"A,´. All the original characters remain, in spite of performing a search-and-replace, with As more or less randomly inserted.

I have no idea what is going on here.

A couple of questions here:

  1. Why does ^.*? just match the first spaces in (a) and not the " in (b)?
  2. Why does + and * give different results?
  3. (e) - what to say!? I don't understand the *-quantifiers behaviour here.

I am using BBEdit for this. I have used BBEdit with regular expressions since the 90s and have never encountered any issues with its regexp implementation. But OTOH, I have never tried working with Greek characters before.

Upvotes: 0

Views: 89

Answers (0)

Related Questions