SyncMaster
SyncMaster

Reputation: 9916

File comparison in C#

Is there any in-built class/method for comparing content of two audio/ video files? Or is there any in-built class/method for converting a audio/video file to bit stream?

Upvotes: 2

Views: 4186

Answers (6)

Free User
Free User

Reputation: 31

Example: Binary Comparison of 2 Files

/// <summary>
/// Methode, die einen Binärvergleich von 2 Dateien macht und
/// das Vergleichsergebnis zurückliefert.
/// </summary>
/// <param name="p_FileA">Voll qualifizierte Pfadangabe zur ersten Datei.</param>
/// <param name="p_FileB">Voll qualifizierte Pfadangabe zur zweiten Datei.</param>
/// <returns>True, wenn die Dateien binär gleich sind, andernfalls False.</returns>
private static bool FileDiffer(string p_FileA, string p_FileB)
{
    bool retVal = true;
    FileInfo infoA = null;
    FileInfo infoB = null;
    byte[] bufferA = new byte[128];
    byte[] bufferB = new byte[128];
    int bufferRead = 0;

    // Die Dateien überprüfen
    if (!File.Exists(p_FileA))
    {
        throw new ArgumentException(String.Format("Die Datei '{0}' konnte nicht gefunden werden", p_FileA), "p_FileA");
    }
    if (!File.Exists(p_FileB))
    {
        throw new ArgumentException(String.Format("Die Datei '{0}' konnte nicht gefunden werden", p_FileB), "p_FileB");
    }

    // Dateiinfo wegen der Dateigröße erzeugen
    infoA = new FileInfo(p_FileA);
    infoB = new FileInfo(p_FileB);

    // Wenn die Dateigröße gleich ist, dann einen Vergleich anstossen
    if (infoA.Length == infoB.Length)
    {
        // Binärvergleich
        using (BinaryReader readerA = new BinaryReader(File.OpenRead(p_FileA)))
        {
            using (BinaryReader readerB = new BinaryReader(File.OpenRead(p_FileB)))
            {
                // Dateistream blockweise über Puffer einlesen
                while ((bufferRead = readerA.Read(bufferA, 0, bufferA.Length)) > 0)
                {
                    // Dateigrößen sind gleich, deshalb kann hier
                    // ungeprüft auch von der 2. Datei eingelesen werden
                    readerB.Read(bufferB, 0, bufferB.Length);

                    // Bytevergleich innerhalb des Puffers
                    for (int i = 0; i < Math.Min(bufferA.Length, bufferRead); i++)
                    {
                        if (bufferA[i] != bufferB[i])
                        {
                            retVal = false;
                            break;
                        }
                    }

                    // Wenn Vergleich bereits fehlgeschlagen, dann hier schon abbruch
                    if (!retVal)
                    {
                        break;
                    }
                }
            }
        }
    }
    else
    {
        // Die Dateigröße ist schon unterschiedlich
        retVal = false;
    }

    return retVal;
}

Upvotes: 2

Free User
Free User

Reputation: 31

Example: Generating SHA1 and MD5 hashes in .NET (C#)

public static string GenerateHash(string filePathAndName)
{
  string hashText = "";
  string hexValue = "";

  byte[] fileData = File.ReadAllBytes(filePathAndName);
  byte[] hashData = SHA1.Create().ComputeHash(fileData); // SHA1 or MD5

  foreach (byte b in hashData)
  {
    hexValue = b.ToString("X").ToLower(); // Lowercase for compatibility on case-sensitive systems
    hashText += (hexValue.Length == 1 ? "0" : "") + hexValue;
  }

  return hashText;
}

Upvotes: 1

Anuraj
Anuraj

Reputation: 19598

There is no direct way to compare files. And you have to deal with Audio / Video files, which will be relatively big, I don't know Bitwise comparison will work or not.

Upvotes: 2

Paul Stovell
Paul Stovell

Reputation: 32715

The other answers are good - either hashing (if you are comparing the file to multiple candidates) or a byte-wise comparison (if comparing two single files).

Here are a couple of additional thoughts:

First, check the file sizes - if they are different, then don't waste time comparing bytes. These are quick to check.

Second, try searching from the end or the middle of the file using a binary chop approach.

E.g., suppose you have a file like this:

ABCDEFGHIJKLMNOP

Then it is modified to this:

ABCDEF11GHIJKLMN

For the file size to remain the same, and content to have been inserted, the other bytes will be "knocked out". So a binary chop approach might pick this up with less reads (e.g., in seek to and read bytes SIZE/2-10 to SIZE/2+10 from both files, and compare).

You could try to combine the techniques. If you do it over a good enough sample of the data you deal with, you might find that of all the different files you compare (example):

  • 80% were found because the file size was different (10ms per file)
  • 10% were found due to binary chop (50ms per file)
  • 10% were found due to linear byte comparisons (2000ms per file)

Doing a binary chop over the whole file wouldn't be so smart, since I expect the hard disk will be faster if reading linearly rather than seeking to random spots. But if you check SIZE/2, then SIZE/4+SIZE/4x3, then SIZE/8, for say 5 iterations, you might find most of the differences without having to do a bytewise comparrison. Just some ideas.

Also, instead of reading from the front of the file, perhaps try reading from the end of the file backwards. Again you might be trading off seek time for probability, but in the "insert" scenario, assuming a change is made halfway into the file, you'll probably find this faster by starting from the end than from the start.

Upvotes: 3

sipsorcery
sipsorcery

Reputation: 30699

You could use the hash functions in System.Security.Cryptography on two file streams and compare them. This is easy to do and works well for small files. If your files are big, which they probably are if you're dealing with audio/video, then reading in the file and generating the hash can take a bit of time.

Upvotes: 3

Chris
Chris

Reputation: 40613

You could do a byte-wise comparison of the two files. System.IO.File.ReadAllBytes(...) would be useful for that.

Upvotes: 2

Related Questions