Natasha Thapa
Natasha Thapa

Reputation: 969

Find duplicate items in csv file

I have a CSV file with

FirstName LastName and ID column, Id is an Unique Column

Chris, Webber, 1 
Chris, Ben, 2
Chris, Dudley, 3
David, Floy, 4
Chris, Ben, 5 
Chris, Webber, 6

I need to get two list without using a DB, i need to read it from file in c# and create two list duplicate list and originalList.

duplicate list has all the duplicates entry

Chris, Webber, 1
Chris, Webber, 6
Chris, Ben, 2
Chris, Ben, 5

Original List has unique entry and first occurrence of duplicate entry.

Chris, Webber, 1
Chris, Ben, 2
Chris, Dudley, 3
David, Floy, 4

What is the best way solve this?

Upvotes: 2

Views: 5025

Answers (3)

Bagzli
Bagzli

Reputation: 6579

  1. Create a String Array/Map/ArrayList/List that will hold the unique ID's. --- Use whichever you are most comofrtable working with.
  2. Read the file in line by line
  3. Check if ID is already part of Array/Map/ArrayList/List you created - if it is not, add it to the array list, if it is then do not.

As you are adding it to the array list you can also add the entire row in to a dataset which you can use to store all the unique records that you now have.

Upvotes: 1

Austin Salonen
Austin Salonen

Reputation: 50235

var lines = File.ReadLines("yourFile.ext");

// this assumes you can hold the whole file in memory

// uniqueness is defined by the first two columns
var grouped = lines.GroupBy(line => string.Join(", ", line.Split(',').Take(2)))
                   .ToArray();

// "unique entry and first occurrence of duplicate entry" -> first entry in group
var unique = grouped.Select(g => g.First());

var dupes = grouped.Where(g => g.Count() > 1)
                   .SelectMany(g => g);

Console.WriteLine("unique");
foreach (var name in unique)
    Console.WriteLine(name);

Console.WriteLine("\nDupes");
foreach (var name in dupes)
    Console.WriteLine(name);

Output:

unique
Chris, Webber, 1
Chris, Ben, 2
Chris, Dudley, 3
David, Floy, 4

Dupes
Chris, Webber, 1
Chris, Webber, 6
Chris, Ben, 2
Chris, Ben, 5

Upvotes: 6

Nevyn
Nevyn

Reputation: 2683

read it in line by line, treat like a plain text file.

parse each line using string.split on ','

use one List to track ID's, using .Contains

use custom data object structures for the data itself, and make two lists, one for the unique entries and one for the duplicates. (total of 3 lists)

if you want actual code examples, please give a list of things you have tried for me to debug along with what the errors are.

Upvotes: 1

Related Questions