cs98
cs98

Reputation: 129

How can I compare two IEnumerable<> objects and return a new one?

I want to compare two IEnumerable<> object and return a new IEnumerable<> object.

Below is my code where I have newFiles object and then I have OriginalFiles object. I want to compare these two IEnumerable<> object and find those files which are new files and also got modified files.

FileConfig class has md5Hash value of each file as a string so I can compare OriginalFiles and newFiles object on md5Hash string to figure out what files have changed and then create a new IEnumerable<FileConfig> object with those modified files + new files.

For example: if newFiles object has total 10 files and OriginalFiles has 8 files so that means there are two new files in it. And then remaining 8 I will compare and see what files are changed by using md5Hash string so if 5 files changed out of 8 and there were two new files also so total I will return 7 files as IEnumerable<FileConfig> object.

public class ProcessFile
{
    public IEnumerable<FileConfig> OriginalFiles { get; set; }


    public IEnumerable<FileConfig> GetNewFiles(IEnumerable<FileConfig> newFiles)
    {
        // compare OriginalFiles and newFiles object and return a new IEnumerable<FileConfig> object 
        // which has only those files which are modified or new by comparing on md5Hash string
       foreach (var element1 in newFiles)
        {
            var newFileName = element1.Name;
            var newMd5Hash = element1.MD5Hash;
            foreach (var element2 in this.OriginalFiles)
            {
                var originalFileName = element2.Name;
                var originalmd5Hash = element2.MD5Hash;
                if (newFileName.Equals(originalFileName, StringComparison.InvariantCultureIgnoreCase) && !newMd5Hash.Equals(originalmd5Hash, StringComparison.InvariantCultureIgnoreCase))
                {
                    yield return new FileConfig
                    {
                        Name = newFileName,
                        Timestamp = element1.Timestamp,
                        MD5Hash = newMd5Hash
                    };
                }
            }
        }

    }
}

public class FileConfig
{
    public string Name { get; set; }
    public DateTime Timestamp { get; set; }
    public string MD5Hash { get; set; }
}

I can run two for loops and compare each file on their md5Hash string and figure out what files have been modified and return new IEnumerable<FileConfig> object but is there any shortcut that can do the same thing easily or any other better way in C#?

Upvotes: 0

Views: 1451

Answers (2)

Tom&#225;š Filip
Tom&#225;š Filip

Reputation: 817

in your position I would use LinQ. Also we dont know what FileConfig is like.

This example returns list of new and changed files.

I have used FileInfo properties. Your FileConfig class can inherit from that with

public class FileConfig : FileInfo

So you dont miss out for those compareable properties.

public class ProcessFile
    {
        public IEnumerable<FileInfo> OriginalFiles { get; set; }


        public IEnumerable<FileInfo> GetNewFiles(IEnumerable<FileInfo> newFiles)
        {
            List<FileInfo> result = new List<FileInfo>();
            result.AddRange(newFiles.Where(x => !OriginalFiles.Any(a => a.FullName == x.FullName) || OriginalFiles.Any(a => a.FullName == x.FullName && a.Length != x.Length)));
            return result;

        }
    }

It should be straight forward if you are familiar with LinQ. If not I would suggest some research on it. https://learn.microsoft.com/cs-cz/dotnet/csharp/tutorials/working-with-linq

If you have any question, I will be glad to help you.

LinQ with FileConfig

public class ProcessFile
    {
        public IEnumerable<FileConfig> OriginalFiles { get; set; }


        public IEnumerable<FileConfig> GetNewFiles(IEnumerable<FileInfo> newFiles)
        {
            List<FileConfig> result = new List<FileConfig>();
            result.AddRange(newFiles.Where(x => !OriginalFiles.Any(a => a.Name == x.Name) || OriginalFiles.Any(a => a.Name == x.Name && a.MD5Hash != x.MD5Hash)));
            return result;

        }
    }

Returning IEnumerable only:

   public class ProcessFile
        {
            public IEnumerable<FileConfig> OriginalFiles { get; set; }
    
    
            public IEnumerable<FileConfig> GetNewFiles(IEnumerable<FileInfo> newFiles)
            {
                return newFiles.Where(x => OriginalFiles.Any(a => a.Name != x.Name || (a.Name == x.Name && a.MD5Hash != x.MD5Hash)));
    
            }
        }

Upvotes: 1

Enigmativity
Enigmativity

Reputation: 117144

It seems to me that you need a left outer join to get the case of new files and a change of an existing one. This should do it:

public IEnumerable<FileConfig> GetNewFiles(IEnumerable<FileConfig> newFiles) =>
    from element1 in newFiles
    join element2 in this.OriginalFiles
        on element1.Name.ToLowerInvariant() equals element2.Name.ToLowerInvariant()
        into g
    where !g.Any() || !element1.MD5Hash.Equals(g.First().MD5Hash, StringComparison.InvariantCultureIgnoreCase)
    select new FileConfig
    {
        Name = element1.Name,
        Timestamp = element1.Timestamp,
        MD5Hash = element1.MD5Hash,
    };

If you made FileConfig read-only then you could do this:

public IEnumerable<FileConfig> GetNewFiles(IEnumerable<FileConfig> newFiles) =>
    from element1 in newFiles
    join element2 in this.OriginalFiles
        on element1.Name.ToLowerInvariant() equals element2.Name.ToLowerInvariant()
        into g
    where !g.Any() || !element1.MD5Hash.Equals(g.First().MD5Hash, StringComparison.InvariantCultureIgnoreCase)
    select element1;

Upvotes: 1

Related Questions