proudgeekdad
proudgeekdad

Reputation: 3445

How can I identify a file type from a blob/filestream?

We bought an "off the shelf" application a lonnng time ago that is capable of storing files as a blob within SQL Server. We've noticed that the database has more than doubled in size within the past six months due to more frequent usage of this blob field by one department. As a result, the application has become painfully slow.

I've been tasked with removing the blob field from the database and saving the file onto the actual file system. Unfortunately, the application does not store what the file type is within the database. Although I can read the file as it exists in the database, I don't know what extension to save the file as. The application's support desk no longer supports this version of the software and will not talk to us about extracting the data. Unfortunately, we do not have access to their source code.

Any suggestions would be greatly appreciated! Thanks in advance!

Upvotes: 9

Views: 10710

Answers (7)

Eric Smith
Eric Smith

Reputation: 5392

In your words "... I've been tasked with ..." - are you sure this is not a problem that can be solved by performing some optimisation on the database? Going down the path of hacking a legacy application for which you don't have the source code and for which there is no support is something that you want to avoid as much as possible.

So, look at the problem here - slowness of the application. So why is it slow? Something a database index couldn't solve?

If you have to extract the BLOBs to the filesystem, then why is the format of the file relevant? Surely the application is designed to do a query on the database in order to get the file anyway--how would extracting the BLOBs to the filesystem help you?

As an aside though - I usually use the Unix file command to identify files through magic numbers.

Upvotes: 0

Matt Spradley
Matt Spradley

Reputation: 8434

Another option would be to physically partition your database. You could store the table that contains the blob field in a different database file that could be stored on a different hard disk/spindle. The disk that contains the blob table could be further optimized by making it RAID 0 or RAID 10.

This could speed things up. Also, this approach eliminates the need to change the application logic.

Upvotes: 0

M4N
M4N

Reputation: 96596

You could use the FindMimeFromData() function in UrlMon.dll (using pinvoke).

See this page for an example and this MSDN page for the documentation of the function.

Upvotes: 1

Eric
Eric

Reputation: 19873

How do you know that this information is human readable?

files extensions purposes are to associate a particular software for opening the file so that when you double click on a .txt file, notepad.exe handles it.

If you're only writing them to the filesystem for the sake of offloading the database, then there is no need for any specific extension. If you want to associate a software with it, then use the appropriate extension

Upvotes: 0

Rob Boek
Rob Boek

Reputation: 1973

You might try using TriD http://mark0.net/soft-trid-e.html

It will scan the files and try to determine the extension.

Upvotes: 2

Paul Sonier
Paul Sonier

Reputation: 39510

Don't save it as any type. Save it as a file with no extension. If you don't know what it is, don't fake it. If the app that saved it requests it, return it from the filesystem the same way it would be returned from the database; as binary data. The database doesn't care what type of data the Binary Object is; neither should you.

Upvotes: 3

BlackTigerX
BlackTigerX

Reputation: 6146

you can look at the first few bytes and figure it out for the most common file types

http://www.garykessler.net/library/file_sigs.html

Upvotes: 8

Related Questions