iOS Gamer
iOS Gamer

Reputation: 496

How can I hash a file on iOS using swift 3?

I have a number of files that will live on a server. Users have the ability to create these kinds of files (plists) on-device which will then upload to said server (CloudKit). I would like to unique them by content (the uniquing methodology should be resilient to variations in creation date). My understanding is that I should hash these files in order to obtain unique file names for them. My questions are:

  1. Is my understanding correct that what I want is a hash function?
  2. Which function should I use (from CommonCrypto).
  3. What I need is a digest?
  4. How would I go about it in code? (I assume this should be hashed over an NSData instance?). My understanding from googling around is that I need a bridging header include but beyond that the use of CommonCrypto baffles me. If there is a simpler way using first-party APIs (Apple) I am all ears (I want to avoid using third party code as much as possible).

Thanks so much!

Upvotes: 11

Views: 7535

Answers (4)

Apptek Studios
Apptek Studios

Reputation: 742

An update using Apple's CryptoKit: You could use a FileHandle to read the data in chunks, and pass these into the hasher:

import CryptoKit

func getSHA256(forFile url: URL) throws -> SHA256.Digest {
    let handle = try FileHandle(forReadingFrom: url)
    var hasher = SHA256()
    while autoreleasepool(invoking: {
        let nextChunk = handle.readData(ofLength: SHA256.blockByteCount)
        guard !nextChunk.isEmpty else { return false }
        hasher.update(data: nextChunk)
        return true
    }) { }
    let digest = hasher.finalize()
    return digest

    // Here's how to convert to string form
    //return digest.map { String(format: "%02hhx", $0) }.joined()
}

Upvotes: 3

DesignatedNerd
DesignatedNerd

Reputation: 2534

As of Swift 5, @chriswillow's answer is still basically correct, but there were some updates to withUnsafeBytes/withUnsafeMutableBytes. These updates make the methods more type-safe, but also moderately more annoying to use.

For the bit using withUnsafeBytes, use:

_ = data.withUnsafeBytes { bytesFromBuffer -> Int32 in
  guard let rawBytes = bytesFromBuffer.bindMemory(to: UInt8.self).baseAddress else {
    return Int32(kCCMemoryFailure)
  }

  return CC_SHA256_Update(&context, rawBytes, numericCast(data.count))
}

For the bit generating the final digest data, use:

var digestData = Data(count: Int(CC_SHA256_DIGEST_LENGTH))
_ = digestData.withUnsafeMutableBytes { bytesFromDigest -> Int32 in
  guard let rawBytes = bytesFromDigest.bindMemory(to: UInt8.self).baseAddress else {
    return Int32(kCCMemoryFailure)
  }

  return CC_SHA256_Final(rawBytes, &context)
}

Upvotes: 3

christopher.online
christopher.online

Reputation: 2774

Solution which also works on large files because it does not require the whole file to be in memory:

func sha256(url: URL) -> Data? {
    do {
        let bufferSize = 1024 * 1024
        // Open file for reading:
        let file = try FileHandle(forReadingFrom: url)
        defer {
            file.closeFile()
        }

        // Create and initialize SHA256 context:
        var context = CC_SHA256_CTX()
        CC_SHA256_Init(&context)

        // Read up to `bufferSize` bytes, until EOF is reached, and update SHA256 context:
        while autoreleasepool(invoking: {
            // Read up to `bufferSize` bytes
            let data = file.readData(ofLength: bufferSize)
            if data.count > 0 {
                data.withUnsafeBytes {
                    _ = CC_SHA256_Update(&context, $0, numericCast(data.count))
                }
                // Continue
                return true
            } else {
                // End of file
                return false
            }
        }) { }

        // Compute the SHA256 digest:
        var digest = Data(count: Int(CC_SHA256_DIGEST_LENGTH))
        digest.withUnsafeMutableBytes {
            _ = CC_SHA256_Final($0, &context)
        }

        return digest
    } catch {
        print(error)
        return nil
    }
}

Usage with instance of type URL with name fileURL previously created:

if let digestData = sha256(url: fileURL) {
    let calculatedHash = digestData.map { String(format: "%02hhx", $0) }.joined()
    DDLogDebug(calculatedHash)
}

Upvotes: 10

zaph
zaph

Reputation: 112865

Create a cryptographic hash of each file and you can use that for uniqueness comparisons. SHA-256 is a current hash function and on iOS with Common Crypto is quite fast, on an iPhone 6S SHA256 will process about 1GB/second minus the I/O time. If you need fewer bytes just truncate the hash.

An example using Common Crypto (Swift3)

For hashing a string:

func sha256(string: String) -> Data {
    let messageData = string.data(using:String.Encoding.utf8)!
    var digestData = Data(count: Int(CC_SHA256_DIGEST_LENGTH))

    _ = digestData.withUnsafeMutableBytes {digestBytes in
        messageData.withUnsafeBytes {messageBytes in
            CC_SHA256(messageBytes, CC_LONG(messageData.count), digestBytes)
        }
    }
    return digestData
}
let testString = "testString"
let testHash = sha256(string:testString)
print("testHash: \(testHash.map { String(format: "%02hhx", $0) }.joined())")

let testHashBase64 = testHash.base64EncodedString()
print("testHashBase64: \(testHashBase64)")

Output:
testHash: 4acf0b39d9c4766709a3689f553ac01ab550545ffa4544dfc0b2cea82fba02a3
testHashBase64: Ss8LOdnEdmcJo2ifVTrAGrVQVF/6RUTfwLLOqC+6AqM=

Note: Add to your Bridging Header:

#import <CommonCrypto/CommonCrypto.h>

For hashing data:

func sha256(data: Data) -> Data {
    var digestData = Data(count: Int(CC_SHA256_DIGEST_LENGTH))

    _ = digestData.withUnsafeMutableBytes {digestBytes in
        data.withUnsafeBytes {messageBytes in
            CC_SHA256(messageBytes, CC_LONG(data.count), digestBytes)
        }
    }
    return digestData
}

let testData: Data = "testString".data(using: .utf8)!
print("testData: \(testData.map { String(format: "%02hhx", $0) }.joined())")
let testHash = sha256(data:testData)
print("testHash: \(testHash.map { String(format: "%02hhx", $0) }.joined())")

Output:
testData: 74657374537472696e67
testHash: 4acf0b39d9c4766709a3689f553ac01ab550545ffa4544dfc0b2cea82fba02a3

Also see Martin's link.

Upvotes: 14

Related Questions