Emmy
Emmy

Reputation: 145

How to remove duplicate string without removing blank lines? C#

I am using a built in function to remove duplicate lines but function is considering blank rows also as duplicates. Could anyone help me getting where I am mistaken? Here is my code:

 protected void Remove_Duplicate_Lines_Click(object sender, EventArgs e)
    {          
       (Remove_Empty_Lines_CheckBox_id.Checked)      // Remove Blank Rows

         String input_txt = "A \n\n B \n \n B \n\n C \n\n C \n\n D \n\n E";

                string[] distinctLines = input_txt.Split(new string[] { Environment.NewLine }, StringSplitOptions.None).Distinct().ToArray(); 
                txt_output.InnerText = string.Join("\r\n", distinctLines);}

Example: enter image description here

Example 2 (undesired)

enter image description here

Upvotes: 2

Views: 555

Answers (3)

NineBerry
NineBerry

Reputation: 28499

Using Distinct() will not work here, because Distinct() is not guaranteed to keep the order of elements. Use a traditional approach: A loop and some variables to remember the state.

Use a HashSet to remember the lines you have seen before and a bool variable to remember whether there was an empty line in the input since the last time we output something to the output list.

string inputText = textBox1.Text;
List<string> outputLines = new List<string>();

// Use appropriate String Comparer based on your requirements
HashSet<string> seenLines = new HashSet<string>(StringComparer.CurrentCultureIgnoreCase);
bool seenEmptyLine = false;

string[] lines = inputText.Split('\n');

foreach(string line in lines)
{
    string trimmedLine = line.Trim();

    if(trimmedLine == "")
    {
        // When we see an empty line, we remember that we have seen one
        seenEmptyLine = true;
    }
    else
    {
        // When we see a non-empty line, we add it only when we have not seen it before
        if(seenLines.Contains(trimmedLine))
        {
            // Seen line before, skip it
        }
        else
        {
            // Remember we have seen this line
            seenLines.Add(trimmedLine);

            // if we have seen an empty line since adding last line,
            // add empty line
            if(seenEmptyLine)
            {
                seenEmptyLine = false;
                outputLines.Add("");
            }

            outputLines.Add(trimmedLine);
        }
    }

}

string outputText = string.Join(Environment.NewLine, outputLines);

textBox2.Text = outputText;

Upvotes: 1

Magnus
Magnus

Reputation: 46947

One way would be to implement your own EqualityComparer:

void Main()
{
    var str = "A\nA\n\n\nB\n\nD\nA\n\nE";

    var before = str.Split(new string[] { "\n" }, StringSplitOptions.None);
    var after = before.Distinct(new MyComparer());
}

public class MyComparer : EqualityComparer<string>
{
    public override bool Equals(string x, string y)
    {
        if(x == "" && y == "")
            return false;
        return x.Equals(y);
    }

    public override int GetHashCode(string obj)
    {
        return obj.GetHashCode();
    }
}

Upvotes: 0

Rick Davin
Rick Davin

Reputation: 1041

There are a few problems, mostly with your split. You compose input__txt to contain "\n" but later try to split on Environment.NewLine, which itself is "\n\r". Thus the split won't occur as you desire.

Let's consider this:

String input_txt = "A \n A \n B \n D \n A \n E";

Secondly, even if you split on "\n", the result will have 2 entries for A, namely "A" and " A ", because you have not trimmed anything.

My suggestion would be to split on more than 1 pattern AND remove empty entries. Either you would need to also Trim() each item, or else just add " " in the split pattern. The final result will not have any blank lines between entries. To control blank lines later when you need them, do this yourself when you need to output.

String input_txt = "A \n A \n B \n D \n A \n E";

string[] distinctLines = input_txt.Split(new string[] { Environment.NewLine, "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
                                    .Select(x => x.Trim())
                                    .Distinct()
                                    .ToArray();
txt_output.InnerText = string.Join(Environment.NewLine + Environment.NewLine, distinctLines);

This will output:

A

B

D

E

Upvotes: 2

Related Questions