How to correctly clean spaces or enters from list of strings in WPF?

Question

I am creating this short desktop application that cleans extra spaces or enters from string. You know, sometimes when you copy text from pdf to put it e.g. on google translator than you paste, and text is like brake in lines with extra enters or spaces. So I create for me this simple app, that cleans this extra spaces and enters and join it into one paragraph.

Here is my code and coment where I debug a mistake:

List content = new List();
TextRange textRange = new TextRange(RichTb1.Document.ContentStart, RichTb1.Document.ContentEnd);
TextRange joiniText = new TextRange(RichTb2.Document.ContentStart, RichTb2.Document.ContentEnd);

string[] lines = textRange.Text.Split(new string[] { "
", "
" }, StringSplitOptions.None);
//to here is all ok, you can see in my List "lines" all lines that I have put it on RichTb1
content.AddRange(lines);

//this is just validation if entry in RichTb1 is empty (if not empty procede with action)
string match1 = content.ElementAt(0);

if (!string.IsNullOrWhiteSpace(match1))
{
   //**Here is problem, it clean all spaces or enters - empty lines, but also it clean not empty lines it also cleans some strings, see example down**
   content = content.Where(s => !string.IsNullOrWhiteSpace(s)).Distinct().ToList();

   joinText.Text = content.Aggregate((i, j) => i + " " + j);  
}

Here is result what it do, e.g. you put some random text like this:

"Chapter 4 illustrates the growing recognition
of
the
benefits
of
community
management
of
natural
resources.
To
ensure
that

such
approaches
do
not
exclude
poor
people,

**women,
the
elderly**
and
other
marginalized

groups,
governments
and
other
organizations

that
sponsor
community-based
projects
need

to
involve
all
groups
in
decision-making
and

implementation."

My result from my app is this:

"Chapter 4 illustrates the growing recognition of the benefits community management natural resources. To ensure that such approaches do not exclude poor people, **women, elderly** and other marginalized groups, governments organizations sponsor community-based projects need to involve all groups in decision-making implementation."

As you see (this is just example) it just clears some words that it should not, in example above (strong text) you can see, that word "the" is missing, in first text there is this word. Also in my lines I can see this word. But when lines come to problem line it cleans strings (words) that should not.

Any ideas what is the problem... Thanks in advance

Michael McGriff · Accepted Answer

The DISTINCT is only allowing distinct words to be returned. Just remove it and you should have no further problem.

See the MSDN docs here: http://msdn.microsoft.com/en-us/library/system.linq.enumerable.distinct(v=vs.95).aspx

How to correctly clean spaces or enters from list of strings in WPF?

Answers (2)

Related Questions