Reputation: 48476
I have a string like this
"a a a a aaa b c d e f a g a aaa aa a a"
I want to turn it into either
"a b c d e f a g a"
or
"a b c d e f a g a "
(whichever's easier, it doesn't matter since it'll be HTML)
"a"
s are line breaks ( \r\n
), in case that changes anything.
Upvotes: 0
Views: 1788
Reputation: 48476
Went with this:
private string GetDescriptionFor(HtmlDocument document)
{
string description = CrawlUsingMetadata(XPath.ResourceDescription, document);
Regex regex = new Regex(@"(\r\n(?:[ ])*|\n(?:[ ])*){3,}", RegexOptions.Multiline | RegexOptions.IgnoreCase);//(?:[^\S\r\n|\n]*\1)+
string result = regex.Replace(description, "\n\n");
string decoded = HttpUtility.HtmlDecode(result);
return decoded;
}
It does, as it's supposed to, ignore all line breaks except cases where it matches three or more continuous line breaks, ignoring whitespace, and replaces those matches with \n\n
.
Upvotes: 1
Reputation: 43673
If you need C# code and you want to collapse JUST \r\n strings with leading and trailing whitespaces, then the solution is pretty simple:
string result = Regex.Replace(input, @"\s*\r\n\s*", "\r\n");
Check this code here.
Upvotes: 0
Reputation: 43673
Generally your code should be:
s.replace(new RegExp("(\\S)(?:\\s*\\1)+","g"), "$1");
Check this fiddle.
But, depends on what those characters a, b, c, ... represent in your case/question, you might need to change \\S
to other class, such as [^ ]
, and then \\s
to [ ]
, if you want to include \r and \n to being collapsed as well >>
s.replace(new RegExp("([^ ])(?:[ ]*\\1)+","g"), "$1");
Check this fiddle.
However if a is going to represent string \r\n, then you would need a little more complicated pattern >>
s.replace(new RegExp("(\\r\\n|\\S)(?:[^\\S\\r\\n]*\\1)+","g"), "$1");
Check this fiddle.
Upvotes: 1
Reputation: 36622
If I understand the problem correctly, the goal is to remove duplicate copies of a specific character/string, possibly separated by spaces. You can do that by replacing the regular expression (a\s*)+
with a
; +
for multiple consecutive copies, a\s*
for a
s followed by spaces How precisely you do that depends on the language: in Perl it's $str =~ s/(a\s*)+/a /g
, in Ruby it's str.gsub(/(a\s*)+/, "a ")
, and so on.
The fact that a
is actually \r\n
shouldn't complicate things, but might mean that the replacement would work better as s/(\r\n[ \t]*)+/\r\n/g
(since \s
overlaps with \r
and \n
).
Upvotes: 0