TahoeWolverine
TahoeWolverine

Reputation: 1744

How do I validate that an NSData is a PDF?

In working on a feed-reading iPhone app which displays nsdata's (html and pdf) in a UIWebView. I am hitting a snag in some PDF validation logic. I have an NSData object which I know contains a file with .pdf extension. I would like to restrict invalid PDFs from getting any further. Here's my first attempt at validation code, which seems to work for a majority of cases:

// pdfData is an NSData *
NSData *validPDF = [[NSString stringWithString:@"%PDF"] dataUsingEncoding: NSASCIIStringEncoding];
if (!(pdfData && [[pdfData subdataWithRange:NSMakeRange(0, 4)] isEqualToData:validPDF])) {
    // error
}

Unfortunately, a new pdf was uploaded a few days ago. It is valid in the sense that the UIWebView will display it fine, yet it fails my validation test. I have tracked down the issue to the fact that it was a bunch of garbage bytes at the beginning, with the %PDF coming midway through the 14th set of hex characters (the 25 or % is exactly the 54th byte):

%PDF: 25504446
Breaking PDF: 00010000 00ffffff ff010000 00000000 000f0100 0000b5e0 04000200 01000000 ffffffff 01000000 00000000 0f010000 0099e004 00022550 44462d31 etc...

What is the best practice for validating NSData to be a PDF?
What might be wrong with this particular PDF (it claims it was encoded by PaperPort 11.0, whatever that is)?

Thanks,

Mike

Upvotes: 9

Views: 4403

Answers (5)

Hoang Anh Tuan
Hoang Anh Tuan

Reputation: 420

The previous answers don't work for me. There are cases that it returns false for pdf data.

Using this works for me:

func isPDFData(data: Data) {   
    PDFDocument(data: data) != nil   
}

Upvotes: 1

NSExceptional
NSExceptional

Reputation: 1390

Swift 4

extension Data {
    var isPDF: Bool {
        guard self.count >= 1024 else { return false }
        let pdfHeader = Data(bytes: "%PDF", count: 4)
        return self.range(of: pdfHeader, options: [], in: Range(NSRange(location: 0, length: 1024))) != nil
    }
}

Upvotes: 5

Ullas Pujary
Ullas Pujary

Reputation: 359

let fileManager = FileManager()
    let documentsPath = NSSearchPathForDirectoriesInDomains(.documentDirectory, .userDomainMask, true)[0]
    let rootDirectory = "\(documentsPath)/\(caption!)/"
    let imageURL = URL(fileURLWithPath: rootDirectory).appendingPathComponent("0")
    let ns = NSData(contentsOf: imageURL)
    let fileExists = fileManager.fileExists(atPath: imageURL.path)
    var isPDF:Bool = false
    if (ns?.length)! >= 1024 //only check if bigger
    {
        var pdfBytes = [UInt8]()
        pdfBytes = [ 0x25, 0x50, 0x44, 0x46]
        let pdfHeader = NSData(bytes: pdfBytes, length: 4)
        let a = ns?.range(of: pdfHeader as Data, options: .anchored, in: NSMakeRange(0, 1024))
        if (a?.length)! > 0
        {
            isPDF = true


        }
        else
        {
            isPDF = false

        }
    }

Upvotes: 4

Vignesh Kumar
Vignesh Kumar

Reputation: 598

May be try this..

    // Validate PDF using NSData
    - (BOOL)isValidePDF:(NSData *)pdfData {
        BOOL isPDF = false;
        if (pdfData.length >= 1024 ) {

            int startMetaCount = 4, endMetaCount = 5;
            // check pdf data is the NSData with embedded %PDF & %%EOF
            NSData *startPDFData = [NSData dataWithBytes:"%PDF" length:startMetaCount];
            NSData *endPDFData = [NSData dataWithBytes:"%%EOF" length:endMetaCount];
            // startPDFData, endPDFData data are the NSData with embedded in pdfData
            NSRange startRange = [pdfData rangeOfData:startPDFData options:0 range:NSMakeRange(0, 1024)];
            NSRange endRange = [pdfData rangeOfData:endPDFData options:0 range:NSMakeRange(0, pdfData.length)];

            if (startRange.location != NSNotFound && startRange.length == startMetaCount && endRange.location != NSNotFound && endRange.length == endMetaCount ) {
                // This assumes the start & end PDFData doesn't have a specific range in file pdf data
                isPDF = true;

            } else  {
                isPDF = false;
            }
        }
        return isPDF;
    }

Upvotes: 3

Chris
Chris

Reputation: 2907

In Swift I have the following:

var isPDF:Bool = false
if assetData.length >= 1024 //only check if bigger
{
    var pdfBytes = [UInt8]()
    pdfBytes = [ 0x25, 0x50, 0x44, 0x46]
    let pdfHeader = NSData(bytes: pdfBytes, length: 4)
    let foundRange = assetData.rangeOfData(pdfHeader, options: nil, range: NSMakeRange(0, 1024))
    if foundRange.length > 0
    {
        isPDF = true
    }
}

Upvotes: 3

Related Questions