Get file type not using the file extension c#

I know this has been asked before, but neither of the solutions worked for me. I want to know if the file uploaded to my server (via a .ashx) is of type .xlsx, .xls or .csv.

I tried using the magic numbers listed here, but if I for example change the extension of a .msi to .xls, the file will be recognized as .xls... The following code ilustrates what i said:

private bool IsValidFileType(HttpPostedFile file)
{
    using (var memoryStream = new MemoryStream())
    {
        file.InputStream.CopyTo(memoryStream);
        byte[] buffer = memoryStream.ToArray();

        //Check exe and dll
        if (buffer[0] == 0x4D && buffer[1] == 0x5A)
        {
            return false;
        }

        //Check xlsx
        if (buffer.Length >= 3 &&
            buffer[0] == 0x50 && buffer[1] == 0x4B &&
            buffer[2] == 0x03 && buffer[3] == 0x04 ||
            buffer[0] == 0x50 && buffer[1] == 0x4B &&
            buffer[2] == 0x05 && buffer[3] == 0x06)
        {
            return true;
        }

        //Check xls
        if (buffer.Length >= 7 &&
            buffer[0] == 0xD0 && buffer[1] == 0xCF &&
            buffer[2] == 0x11 && buffer[3] == 0xE0 &&
            buffer[4] == 0xA1 && buffer[5] == 0xB1 &&
            buffer[6] == 0x1A && buffer[7] == 0xE1)
        {
            return true;
        }

        return false;
    }
}

Then I tried using urlmon.dll, something like the following, but it still recognizes the file as .xls

    [DllImport("urlmon.dll", CharSet = CharSet.Unicode, ExactSpelling = true, SetLastError = false)]
    static extern int FindMimeFromData(
        IntPtr pBC,
        [MarshalAs(UnmanagedType.LPWStr)] string pwzUrl,
        [MarshalAs(UnmanagedType.LPArray, ArraySubType=UnmanagedType.I1, SizeParamIndex=3)] byte[] pBuffer,
        int cbSize,
        [MarshalAs(UnmanagedType.LPWStr)] string pwzMimeProposed,
        int dwMimeFlags,
        out IntPtr ppwzMimeOut,
        int dwReserved);

    public static string GetMimeFromFile(string file)
    {
        if (!File.Exists(file))
            throw new FileNotFoundException(file + " not found");

        int MaxContent = (int)new FileInfo(file).Length;
        if (MaxContent > 4096) MaxContent = 4096;
        FileStream fs = File.OpenRead(file);


        byte[] buf = new byte[MaxContent];
        fs.Read(buf, 0, MaxContent);
        fs.Close();
        int result = FindMimeFromData(IntPtr.Zero, file, buf, MaxContent, null, 0, out IntPtr mimeout, 0);

        if (result != 0)
            throw Marshal.GetExceptionForHR(result);
        string mime = Marshal.PtrToStringUni(mimeout);
        Marshal.FreeCoTaskMem(mimeout);
        return mime;
    }

I was thinking that maybe I should try to open the uploaded file with some library for example ExcelDataReader but I'm not sure if this is the best approach.

Any help would be appreciated.

Upvotes: 0

Views: 2750

Answers (3)

Mario Z
Mario Z

Reputation: 4381

I tried using the magic numbers listed here, but if I for example change the extension of a .msi to .xls, the file will be recognized as .xls... The following code ilustrates what i said:

Yes that is true, the only thing that you can determine when checking the file's signature is the format on which the file is based on. So for ".xls" file you will detect that the file is of a compound binary format. However, as you noticed this format is used in ".msi" files, but also in ".doc", ".ppt", etc.

Also, the same is true for your ".xlsx" detection, it is just checking that the file is of a zip format and the same signature will be found in ".zip", ".docx", ".ods", etc.

So, you could check the file's signature and pass through files that are of those two formats, but what about ".csv"? Here, you can have various byte values because it's just a plain text, it doesn't have a signature.

Anyway, I think the real question is what is your goal with those Excel files? Do you need to further process them or what?
If you need to process them further then you should rely on a failing mechanism of the one that is reading that file. So whichever library you pick to read the file will most likely throw an exception because of either an "unrecognized format" or "unrecognized structure" of the file.

By "unrecognized structure" what I mean is, for instance in ".xls" file it's expected to have streams named "Workbook", "SummaryInformation", etc.

Upvotes: 0

Joshua VdM
Joshua VdM

Reputation: 668

A file in itself is just data. The file extension allows your system to interpret that data accordingly. Without a file extension, there's no way of knowing with absolute certainty which file type you're looking at. (Unless you're working with a limited subset of file types)

You can however infer from the data which file extension it MIGHT be. The project that Thierry V referenced is out of date and not mantained.

You might instead want to look at a tool like TrID, which uses a continually growing library of file types. This tool will analyze a file and give a ranking of the most probable file types. Like I said before, it can only tell you with a limited amount of certainty which file type it might be.

Upvotes: 0

Antoine V
Antoine V

Reputation: 7204

How about open file Excel by EPPlus of Interop and catch an exception if it isn't an excel file

FileInfo fileInfo = new FileInfo(filePath);
ExcelPackage package = null;
try
{
    package = new ExcelPackage(fileInfo);
}
catch(Exception exception)
{
}

Or there is a 3rd party (not tested) which verify the type of file.

FileInfo file = new FileInfo("C:\Hello.pdf");
if ( file.isExcel())
    Console.WriteLine("File is PDF");

Upvotes: 1

Related Questions