user13111333
user13111333

Reputation:

Clean duplicates and their instances from a list

I have a datamodel like this :

    public class AmpFile
    {
        public string filename { get; set; }
        public string actualpath { get; set; }
    }

Now I have a list of it like this :

[ list member 1 ]    -    filename:  "testfile1.jpg"    -    actualpath:  "C:\testpath\testfile1.jpg" 
[ list member 2 ]    -    filename:  "brickwall.jpg"    -    actualpath:  "C:\testpath\brickwall.jpg" 
[ list member 3 ]    -    filename:  "mydata.txt"    -    actualpath:  "D:\mydata.txt" 
[ list member 4 ]    -    filename:  "testfile1.jpg"    -    actualpath:  "E:\demo\testfile1.jpg" 
[ list member 5 ]    -    filename:  "mydata.txt"    -    actualpath:  "F:\somefolder\mydata.txt" 
[ list member 6 ]    -    filename:  "testfile1.jpg"    -    actualpath:  "F:\somefolder\testfile1.jpg" 
[ list member 7 ]    -    filename:  "testfile2.jpg"    -    actualpath:  "F:\somefolder\testfile2.jpg" 
[ list member 7 ]    -    filename:  "testfile3.jpg"    -    actualpath:  "D:\testfile3.jpg" 

Now I want to find duplicates of each member and if there's a duplicate of it , I want to remove duplicates + the reference itself so the result I want to achieve is :

[ list member 1 ]    -    filename:  "brickwall.jpg"    -    actualpath:  "C:\testpath\brickwall.jpg" 
[ list member 2 ]    -    filename:  "testfile2.jpg"    -    actualpath:  "F:\somefolder\testfile2.jpg" 
[ list member 3 ]    -    filename:  "testfile3.jpg"    -    actualpath:  "D:\testfile3.jpg" 

How can I do it ?

Upvotes: 1

Views: 105

Answers (6)

kiliz
kiliz

Reputation: 95

If you don't care about getting a new list instead of deleting from the original list, you can do like this (sorry for complexity, I think it can be easly optimized → adding breaks ect...) :

List<AmpFile> foo(List<AmpFile> files)
{
 List<AmpFile> result = new List<AmpFile>();
 bool add = false;
 foreach(AmpFile file in files)
 {
  add = true;
  foreach(AmpFile alreadyAdded in result)
  {
   if(file.filename == alreadyAdded.filename)
   {
    add = false;
   }
  }
  if(add)
  {
   result.Add(file);
  }
 }
 return result;
}

If you really need to change the original list, you can do something like this (can again be optimized) :

void foo2(List<AmpFile> files)
{
 AmpFile[] temp = files.ToArray();
 List<AmpFile> toDelete = new List<AmpFile>();
 foreach(AmpFile file in temp)
 {
  foreach(AmpFile f in files)
  {
   if(f != file && f.filename == file.filename)
   {
    if(!toDelete.Contains(f))
    {
     toDelete.Add(f);
    }
   }
  }
 }

 foreach(AmpFile file in toDelete)
 {
   files.Remove(file);
 }
}

Upvotes: -1

Mohammed Sajid
Mohammed Sajid

Reputation: 4903

you can do it with Linq, by using Group by and filter all elements that have count == 1, like the following code:
1 - Prepare list of ampFile:

List<AmpFile> ampFiles = new List<AmpFile>
{
    new AmpFile{filename="testfile1.jpg",actualpath="C:\\testpath\\testfile1.jpg"},
    new AmpFile{filename="brickwall.jpg",actualpath="C:\\testpath\\brickwall.jpg"},
    new AmpFile{filename="mydata.txt",actualpath="D:\\mydata.txt"},
    new AmpFile{filename="testfile1.jpg",actualpath="E:\\demo\testfile1.jpg"},
    new AmpFile{filename="mydata.txt",actualpath="F:\\somefolder\\mydata.txt"},
    new AmpFile{filename="testfile1.jpg",actualpath="F:\\somefolder\\testfile1.jpg"},
    new AmpFile{filename="testfile2.jpg",actualpath="F:\\somefolder\\testfile2.jpg"},
    new AmpFile{filename="testfile3.jpg",actualpath="D:\\testfile3.jpg"},
};

2 - Call groupBy and filter with Where:

List<AmpFile> notDuplicatedAmpFiles = ampFiles.GroupBy(x => x.filename)
    .Where(x => x.Count() == 1)
    .SelectMany(x => x)
    .ToList();

3 - Demo:

foreach(AmpFile ampFile in notDuplicatedAmpFiles)
{
    Console.WriteLine($"fileName :{ampFile.filename}, actualPath :{ampFile.actualpath}");
}

4 - Result:

fileName :brickwall.jpg, actualPath :C:\testpath\brickwall.jpg
fileName :testfile2.jpg, actualPath :F:\somefolder\testfile2.jpg
fileName :testfile3.jpg, actualPath :D:\testfile3.jpg

I hope this help.

Upvotes: 4

Enigmativity
Enigmativity

Reputation: 117057

I'd suggest this query:

var results =
    from a in list
    group a by a.filename into gas
    where !gas.Skip(1).Any()
    from ga in gas.Take(1)
    select ga;

If you start with this data:

var list = new List<AmpFile>()
{
    new AmpFile() { filename = "testfile1.jpg", actualpath = @"C:\testpath\testfile1.jpg" },
    new AmpFile() { filename = "brickwall.jpg", actualpath = @"C:\testpath\brickwall.jpg" },
    new AmpFile() { filename = "mydata.txt", actualpath = @"D:\mydata.txt" },
    new AmpFile() { filename = "testfile1.jpg", actualpath = @"E:\demo\testfile1.jpg" },
    new AmpFile() { filename = "mydata.txt", actualpath = @"F:\somefolder\mydata.txt" },
    new AmpFile() { filename = "testfile1.jpg", actualpath = @"F:\somefolder\testfile1.jpg" },
    new AmpFile() { filename = "testfile2.jpg", actualpath = @"F:\somefolder\testfile2.jpg" },
    new AmpFile() { filename = "testfile3.jpg", actualpath = @"D:\testfile3.jpg" },
};

...then you get this result:

results

Upvotes: 1

Vivian Mascarenhas
Vivian Mascarenhas

Reputation: 183

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication1
{
public class AmpFile
{
    public string filename { get; set; }
    public string actualpath { get; set; }
}

class Program
{
    static void Main(string[] args)
    {
        List<AmpFile> lstemail = new List<AmpFile>();
        lstemail.Add(new AmpFile { filename = "testfile1.jpg", actualpath= "C:\testpath\testfile1.jpg"});
        lstemail.Add(new AmpFile { filename = "brickwall.jpg", actualpath = "C:\testpath\brickwall.jpg" });
        lstemail.Add(new AmpFile { filename = "mydata.txt", actualpath = @"D:\mydata.txt" });
        lstemail.Add(new AmpFile { filename = "testfile1.jpg", actualpath = @"E:\demo\testfile1.jpg" });

        var myDistinctList = lstemail.GroupBy(i => 
   i.filename).Select(g => g.First()).ToList();
     lstemail = myDistinctList;

    }
}
}

I have used linq better to use over than foreach.

Upvotes: -1

jdweng
jdweng

Reputation: 34421

You can use IEquals like code below. Your paths are in different folders so you do not have any duplicates. See below :

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;

namespace ConsoleApplication1
{
    class Program
    {
        static void Main(string[] args)
        {
            List<AmpFile> files = new List<AmpFile>() {
                new AmpFile() { filename = "testfile1.jpg", actualpath = @"C:\testpath\testfile1.jpg"}, 
                new AmpFile() { filename = "brickwall.jpg", actualpath = @"C:\testpath\brickwall.jpg"}, 
                new AmpFile() { filename = "mydata.txt", actualpath = @"D:\mydata.txt"}, 
                new AmpFile() { filename = "testfile1.jpg", actualpath = @"E:\demo\testfile1.jpg"}, 
                new AmpFile() { filename = "mydata.txt", actualpath = @"F:\somefolder\mydata.txt"}, 
                new AmpFile() { filename = "testfile1.jpg" , actualpath = @"F:\somefolder\testfile1.jpg"}, 
                new AmpFile() { filename = "testfile2.jpg" , actualpath = @"F:\somefolder\testfile2.jpg"}, 
                new AmpFile() { filename = "testfile3.jpg", actualpath = @"D:\testfile3.jpg"}
            };

            List<AmpFile> output = files.Distinct().ToList();
        }
    }
    public class AmpFile : IEquatable<AmpFile>
    {
        public string filename { get; set; }
        public string actualpath { get; set; }

        public Boolean Equals(AmpFile other)
        {
            return ((this.filename == other.filename) && (this.actualpath == other.actualpath));
        }
        public override int GetHashCode()
        {
            return (this.filename + "^" + this.actualpath).GetHashCode();
        }
    }
}

Upvotes: 0

jscarle
jscarle

Reputation: 1305

Running two loops on your list is the fastest way.

List<AmpFile> ampList = new List<AmpFile>();
// Populate list

for (int i = 0; i < ampList.Count; i++)
    for (int j = i + 1; j < ampList.Count; j++)
        if (ampList[j].filename == ampList[i].filename)
            ampList.RemoveAt(j);

Upvotes: -1

Related Questions