Reputation: 2448
I am creating this short desktop application that cleans extra spaces or enters from string. You know, sometimes when you copy text from pdf to put it e.g. on google translator than you paste, and text is like brake in lines with extra enters or spaces. So I create for me this simple app, that cleans this extra spaces and enters and join it into one paragraph.
Here is my code and coment where I debug a mistake:
List<string> content = new List<string>();
TextRange textRange = new TextRange(RichTb1.Document.ContentStart, RichTb1.Document.ContentEnd);
TextRange joiniText = new TextRange(RichTb2.Document.ContentStart, RichTb2.Document.ContentEnd);
string[] lines = textRange.Text.Split(new string[] { "\r\n", "\n" }, StringSplitOptions.None);
//to here is all ok, you can see in my List "lines" all lines that I have put it on RichTb1
content.AddRange(lines);
//this is just validation if entry in RichTb1 is empty (if not empty procede with action)
string match1 = content.ElementAt(0);
if (!string.IsNullOrWhiteSpace(match1))
{
//**Here is problem, it clean all spaces or enters - empty lines, but also it clean not empty lines it also cleans some strings, see example down**
content = content.Where(s => !string.IsNullOrWhiteSpace(s)).Distinct().ToList();
joinText.Text = content.Aggregate((i, j) => i + " " + j);
}
Here is result what it do, e.g. you put some random text like this:
"Chapter 4 illustrates the growing recognition
of
the
benefits
of
community
management
of
natural
resources.
To
ensure
that
such
approaches
do
not
exclude
poor
people,
**women,
the
elderly**
and
other
marginalized
groups,
governments
and
other
organizations
that
sponsor
community-based
projects
need
to
involve
all
groups
in
decision-making
and
implementation."
My result from my app is this:
"Chapter 4 illustrates the growing recognition of the benefits community management natural resources. To ensure that such approaches do not exclude poor people, **women, elderly** and other marginalized groups, governments organizations sponsor community-based projects need to involve all groups in decision-making implementation."
As you see (this is just example) it just clears some words that it should not, in example above (strong text) you can see, that word "the"
is missing, in first text there is this word. Also in my lines I can see this word. But when lines come to problem line it cleans strings (words) that should not.
Any ideas what is the problem... Thanks in advance
Upvotes: 0
Views: 173
Reputation: 460228
Even if it's accepted i would suggest an uncool approach. A plain StringBuilder
is more efficient and foolproof:
StringBuilder sb = new StringBuilder(text.Length);
bool firstSpace = true;
char[] dont = { '\n', '\r' };
for(int i = 0; i < text.Length; i++)
{
char c = text[i];
if (dont.Contains(c)) c = ' '; // replace new-line characters with a single space
bool isWhiteSpace = Char.IsWhiteSpace(c) ;
bool append = !isWhiteSpace || firstSpace;
firstSpace = !isWhiteSpace;
if(append) sb.Append(c);
}
string withOneSpaceAndNoLines = sb.ToString();
Upvotes: 3
Reputation: 803
The DISTINCT
is only allowing distinct words to be returned. Just remove it and you should have no further problem.
See the MSDN docs here: http://msdn.microsoft.com/en-us/library/system.linq.enumerable.distinct(v=vs.95).aspx
Upvotes: 2