Reputation: 137

Notepad++ conditional replace

Task: Replace CRLF with space before lines where first alphabetic sequence does not consist of all capitals.

Text I have:

FOO Bar 123 sometext BAR sometext
Foobar, sometext 123
FOOBAR^&%# sometext sometext 1234 5678
Bar 123 456 FOO 789
barfoobar sometext
BAR; sometext (*&%#) FOOBAR 123

Expected result:

FOO Bar 123 sometext BAR sometext Foobar, sometext 123
FOOBAR^&%# sometext sometext 1234 5678 Bar 123 456 FOO 789 barfoobar sometext
BAR; sometext (*&%#) FOOBAR 123

Well, forgot to mention (if it matters at all), the source text in Russian (Cyrillic, Windows-1251), sample below.

AБИДЖАН (Abidjan) , город и главный порт государства Кот-д'Ивуар,
Aдминистративный центр деп. Абиджан. Ок. 2 млн. жителей 
Aдм. ц. французской колонии Берег Слоновой Кости (БСК). В 1960-83 столица Государства БСК.

Thanks very much for any help.

Cheers,

Michael

Upvotes: 1

Answers (3)

Toto

Reputation: 91430

Ctrl+H
Find what: \R(?![A-ZА-Я]+\b)
Replace with: A single space
CHECK Match case
CHECK Wrap around
CHECK Regular expression
Replace all

Explanation:

\R                  # any kind of linebreak (i.e. \r, \n, \r\n)
(?!                 # negative lookahead, make sure we haven't after:
    [A-ZА-Я]+           # Capital Latin & Cyrillic letters
    \b                  # word boundary, make sure we match a whole word
)                   # end lookahead

Screenshot (before):

Screenshot (after):

Upvotes: 0

Mchief

Reputation: 137

After series of experiments I could develop 3-step solution.

Search (\n[А-Я] ?[A-Я]+) , replace with \n#$1 (https://regex101.com/r/nVHqUt/1) .
Search \r\n , replace with space.
Search #\n , replace with \r\n.

Thanks everyone for your help!

Upvotes: 1

speciesUnknown

Reputation: 1753

Use regex replace, with unicode sequences

Open find and replace

Enable "Match case"

Set search mode to "Regular expression"

Find what: \r\n([\u0600-\u06FF]{0,1}[\u0061-\u007A]{1,})

Replace with: $1 (the space is important)

Upvotes: 1

Notepad++ conditional replace

Answers (3)

Use regex replace, with unicode sequences

Related Questions