gasyoun
gasyoun

Reputation: 23

GREP last {4} word end letters

In InDesign I was hoping [\l]{4}(?=\s) will find the last four letters of words, but the GREP did not work. I wanted to put it in the header of page as the suffix. Was doing magic with \b and $, nothing worked. And http://regex101.com/r/uQ7xR3/1 does not work in InDesign, because it's php flavour.

Because there are several additional conditions. If the 5th letter is h, then instead of 4 we should take 5 last letters of each word. But we do not take anything separated by an \s, nor do we take ... or anything inside | (like | ā |).

virūpacakṣus dharmacakṣus nayacakṣus sūryacakṣus divyacakṣus saṃgrah āsaṃgrah upasaṃgrah pratisaṃgrah abhisaṃgrah anusaṃgrah

Update. Let me add more limitations. Not just a "h", but if there are these combinations kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh, do not take only last 4, but last 5 letters. Same with ai|au - they should not be split.

General case: 1) From vṛddhāpacāyitva take itva. Two exclusions: 2) From nakhāli take khāli instead of just hāli, because kh is treated like a single letter in devanagari script. Identically with kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh. From mirikha take rikha instead of just ikha, because kh is treated like a single letter in devanagari script. Identically with kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh. 3) From mahahrauḍ take hrauḍ instead of just rauḍ, because au is treated like a single letter in devanagari script, so ai|au is like a single letter. From ekaikaivat take aivat instead of just ivat, because ai is treated like a single letter in devanagari script, so ai|au is like a single letter.

Upvotes: 0

Views: 930

Answers (2)

Ron Rosenfeld
Ron Rosenfeld

Reputation: 60464

Perhaps try:

[[:alpha:]]{4}h?\b

For your additional qualifications, you can try:

 (?:ai|au|kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh|[[:alpha:]]){4}h?\b

Again, as before, you will need to replace the posix class for letters with whatever token is the equivalent in InDesign

Upvotes: 0

Jongware
Jongware

Reputation: 22478

Be careful when stating "it does not work", and the reasoning behind it. Your initial GREP [\l]{4}(?=\s) does work in InDesign (although the [..] are superfluous).

Similar, the linked \w\w\w\w$ also works, and it has nothing to do with "php flavor". The reason only the last occurrence is highlighted is because (1) the $ links to end-of-story only, and adding the m multi-line flag makes it work for individual lines, (2) with m only the first instance will be highlighted (the default) and you need g to get them all, but most importantly, (3) \w in a general GREP parser may not be Unicode-aware, and in this case you can see it isn't because \w does not pick up the and . InDesign's GREP, on the other side, is Unicode-aware.

The following expression will work on the specific examples you supplied; the other "single letter" combinations can possibly be added in a similar way.

(au|ai|kh|\l){4}h?\b

When applied to your sample words:

grep with complications

Upvotes: 1

Related Questions