Reputation: 240
Often I'm copying text out of a PDF or similar and the line breaks aren't the way I want them. Instead of many short lines within each paragraph, I want each paragraph to be a single line of text, with a blank line between paragraphs.
Thanks to other answers on here I can fix this with regex in just a few steps:
[\r\n][\r\n]
and replace them with a placeholder string like -------placeholder--------
. don't worry, that placeholder will go back to being the space between paragraphs.[\r\n]
with nothing.-------placeholder--------
with the double linebreaks [\r\n][\r\n]
But I'm curious: is there a way to do this with fewer steps? For example, is it possible in regex to say "find all line breaks, except pairs of line breaks, and replace with nothing"? This would eliminate the need for the placeholder step.
Upvotes: 0
Views: 178
Reputation:
Yes its possible to do this with a single regex.
The approach is to find two letters separated by a line break.
Example:
This is first sentence in paragraph.\nT
his is the second.
This is the second paragraph.
Make sense ?
This is available in two versions. With non-linebreak whitespace trimming
and without trimming.
# Trimming:
# Find: (?<=\S)[^\S\r\n]*\r\n[^\S\r\n]*(?=\S)
# Replace ' '
(?<= \S )
[^\S\r\n]* \r \n [^\S\r\n]*
(?= \S )
and
# Non-Trimming
# Find: (\S[^\S\r\n]*)\r\n([^\S\r\n]*\S)
# Replace: '$1 $2'
( \S [^\S\r\n]* ) # (1)
\r \n
( [^\S\r\n]* \S ) # (2)
Upvotes: 1
Reputation: 1983
Ok, I can tell you how it would work for just \n
In C#:
var input = "test\ntest2\n\ntest3\ntest4";
var regex = @"\n(?:(?=[^\n])(?<=[^\n]\n))";
var s2 = Regex.Replace(input,regex, "");
Console.WriteLine(s2);
Result:
testtest2
test3test4
And I think I got it for \r\n
- but test it thoroughly ;)
var input = "test\r\ntest2\r\n\r\ntest3\r\ntest4";
var regex = @"(?<!\r\n)\r\n(?!\r\n)";
var s2 = Regex.Replace(input,regex, "");
Console.WriteLine(s2);
Result:
testtest2
test3test4
Upvotes: 0