Reputation: 137
Task: Replace CRLF with space before lines where first alphabetic sequence does not consist of all capitals.
Text I have:
FOO Bar 123 sometext BAR sometext
Foobar, sometext 123
FOOBAR^&%# sometext sometext 1234 5678
Bar 123 456 FOO 789
barfoobar sometext
BAR; sometext (*&%#) FOOBAR 123
Expected result:
FOO Bar 123 sometext BAR sometext Foobar, sometext 123
FOOBAR^&%# sometext sometext 1234 5678 Bar 123 456 FOO 789 barfoobar sometext
BAR; sometext (*&%#) FOOBAR 123
Well, forgot to mention (if it matters at all), the source text in Russian (Cyrillic, Windows-1251), sample below.
AБИДЖАН (Abidjan) , город и главный порт государства Кот-д'Ивуар,
Aдминистративный центр деп. Абиджан. Ок. 2 млн. жителей
Aдм. ц. французской колонии Берег Слоновой Кости (БСК). В 1960-83 столица Государства БСК.
Thanks very much for any help.
Cheers,
Michael
Upvotes: 1
Views: 192
Reputation: 91430
\R(?![A-ZА-Я]+\b)
A single space
Explanation:
\R # any kind of linebreak (i.e. \r, \n, \r\n)
(?! # negative lookahead, make sure we haven't after:
[A-ZА-Я]+ # Capital Latin & Cyrillic letters
\b # word boundary, make sure we match a whole word
) # end lookahead
Screenshot (before):
Screenshot (after):
Upvotes: 0
Reputation: 137
After series of experiments I could develop 3-step solution.
(\n[А-Я] ?[A-Я]+)
, replace with \n#$1
(https://regex101.com/r/nVHqUt/1) .\r\n
, replace with space.#\n
, replace with \r\n
.Thanks everyone for your help!
Upvotes: 1
Reputation: 1753
Open find and replace
Enable "Match case"
Set search mode to "Regular expression"
Find what: \r\n([\u0600-\u06FF]{0,1}[\u0061-\u007A]{1,})
Replace with: $1
(the space is important)
Upvotes: 1