Reputation: 145
I am using a built in function to remove duplicate lines but function is considering blank rows also as duplicates. Could anyone help me getting where I am mistaken? Here is my code:
protected void Remove_Duplicate_Lines_Click(object sender, EventArgs e)
{
(Remove_Empty_Lines_CheckBox_id.Checked) // Remove Blank Rows
String input_txt = "A \n\n B \n \n B \n\n C \n\n C \n\n D \n\n E";
string[] distinctLines = input_txt.Split(new string[] { Environment.NewLine }, StringSplitOptions.None).Distinct().ToArray();
txt_output.InnerText = string.Join("\r\n", distinctLines);}
Example 2 (undesired)
Upvotes: 2
Views: 555
Reputation: 28499
Using Distinct()
will not work here, because Distinct()
is not guaranteed to keep the order of elements. Use a traditional approach: A loop and some variables to remember the state.
Use a HashSet
to remember the lines you have seen before and a bool
variable to remember whether there was an empty line in the input since the last time we output something to the output list.
string inputText = textBox1.Text;
List<string> outputLines = new List<string>();
// Use appropriate String Comparer based on your requirements
HashSet<string> seenLines = new HashSet<string>(StringComparer.CurrentCultureIgnoreCase);
bool seenEmptyLine = false;
string[] lines = inputText.Split('\n');
foreach(string line in lines)
{
string trimmedLine = line.Trim();
if(trimmedLine == "")
{
// When we see an empty line, we remember that we have seen one
seenEmptyLine = true;
}
else
{
// When we see a non-empty line, we add it only when we have not seen it before
if(seenLines.Contains(trimmedLine))
{
// Seen line before, skip it
}
else
{
// Remember we have seen this line
seenLines.Add(trimmedLine);
// if we have seen an empty line since adding last line,
// add empty line
if(seenEmptyLine)
{
seenEmptyLine = false;
outputLines.Add("");
}
outputLines.Add(trimmedLine);
}
}
}
string outputText = string.Join(Environment.NewLine, outputLines);
textBox2.Text = outputText;
Upvotes: 1
Reputation: 46947
One way would be to implement your own EqualityComparer:
void Main()
{
var str = "A\nA\n\n\nB\n\nD\nA\n\nE";
var before = str.Split(new string[] { "\n" }, StringSplitOptions.None);
var after = before.Distinct(new MyComparer());
}
public class MyComparer : EqualityComparer<string>
{
public override bool Equals(string x, string y)
{
if(x == "" && y == "")
return false;
return x.Equals(y);
}
public override int GetHashCode(string obj)
{
return obj.GetHashCode();
}
}
Upvotes: 0
Reputation: 1041
There are a few problems, mostly with your split. You compose input__txt
to contain "\n" but later try to split on Environment.NewLine
, which itself is "\n\r". Thus the split won't occur as you desire.
Let's consider this:
String input_txt = "A \n A \n B \n D \n A \n E";
Secondly, even if you split on "\n", the result will have 2 entries for A, namely "A" and " A ", because you have not trimmed anything.
My suggestion would be to split on more than 1 pattern AND remove empty entries. Either you would need to also Trim() each item, or else just add " " in the split pattern. The final result will not have any blank lines between entries. To control blank lines later when you need them, do this yourself when you need to output.
String input_txt = "A \n A \n B \n D \n A \n E";
string[] distinctLines = input_txt.Split(new string[] { Environment.NewLine, "\n", " " }, StringSplitOptions.RemoveEmptyEntries)
.Select(x => x.Trim())
.Distinct()
.ToArray();
txt_output.InnerText = string.Join(Environment.NewLine + Environment.NewLine, distinctLines);
This will output:
A
B
D
E
Upvotes: 2