Reputation: 23
In InDesign I was hoping [\l]{4}(?=\s)
will find the last four letters of words, but the GREP did not work. I wanted to put it in the header of page as the suffix. Was doing magic with \b
and $
, nothing worked. And http://regex101.com/r/uQ7xR3/1 does not work in InDesign, because it's php flavour.
Because there are several additional conditions. If the 5th letter is h
, then instead of 4 we should take 5 last letters of each word. But we do not take anything separated by an \s
, nor do we take ...
or anything inside |
(like | ā |
).
virūpacakṣus
dharmacakṣus
nayacakṣus
sūryacakṣus
divyacakṣus
saṃgrah
āsaṃgrah
upasaṃgrah
pratisaṃgrah
abhisaṃgrah
anusaṃgrah
Update. Let me add more limitations. Not just a "h", but if there are these combinations kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh
, do not take only last 4, but last 5 letters. Same with ai|au - they should not be split.
General case:
1) From vṛddhāpacāyitva
take itva
.
Two exclusions:
2) From nakhāli
take khāli
instead of just hāli
, because kh
is treated like a single letter in devanagari script. Identically with kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh
.
From mirikha
take rikha
instead of just ikha
, because kh
is treated like a single letter in devanagari script. Identically with kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh
.
3) From mahahrauḍ
take hrauḍ
instead of just rauḍ
, because au
is treated like a single letter in devanagari script, so ai|au is like a single letter.
From ekaikaivat
take aivat
instead of just ivat
, because ai
is treated like a single letter in devanagari script, so ai|au is like a single letter.
Upvotes: 0
Views: 930
Reputation: 60464
Perhaps try:
[[:alpha:]]{4}h?\b
For your additional qualifications, you can try:
(?:ai|au|kh|gh|ch|jh|ṭh|ḍh|th|dh|ph|bh|[[:alpha:]]){4}h?\b
Again, as before, you will need to replace the posix class for letters with whatever token is the equivalent in InDesign
Upvotes: 0
Reputation: 22478
Be careful when stating "it does not work", and the reasoning behind it. Your initial GREP [\l]{4}(?=\s)
does work in InDesign (although the [..]
are superfluous).
Similar, the linked \w\w\w\w$
also works, and it has nothing to do with "php flavor". The reason only the last occurrence is highlighted is because (1) the $
links to end-of-story only, and adding the m
multi-line flag makes it work for individual lines, (2) with m
only the first instance will be highlighted (the default) and you need g
to get them all, but most importantly, (3) \w
in a general GREP parser may not be Unicode-aware, and in this case you can see it isn't because \w
does not pick up the ṃ
and ṣ
. InDesign's GREP, on the other side, is Unicode-aware.
The following expression will work on the specific examples you supplied; the other "single letter" combinations can possibly be added in a similar way.
(au|ai|kh|\l){4}h?\b
When applied to your sample words:
Upvotes: 1