Travis Gockel
Travis Gockel

Reputation: 27633

Backing out data from an MD5 checksum

Imagine you have an MD5 sum that was calculated from an array of N 64-byte elements. I want to replace an element at an arbitrary index in the source array with a new element. Then, instead of recalculating the MD5 sum by re-running it through an MD5 function, I would like to "subtract" the old element from the result and "add" the new piece of data to it.

To be a bit more clear, here's some pseudo-Scala:

class Block {
  var summary: MD5Result

  // The key reason behind this question is that the elements might not be
  // loaded. With a large array, it can get expensive to load everything just to
  // update a single thing.
  var data: Array[Option[Element]]

  def replaceElement(block: Block, index: Integer, newElement: Element) = {
    // we can know the element that we're replacing
    val oldElement = block.data(index) match {
        case Some(x) => x
        case None    => loadData(index) // <- this is expensive
      }

    // update the MD5 using this magic function
    summary = replaceMD5(summary, index, oldElement, newElement)
  }
}

Is replaceMD5 implementable? While all signs point to "this is breaking a (weak) cryptographic hash," the actual MD5 algorithm seems to support doing this (but I might be missing something obvious).

Upvotes: 0

Views: 163

Answers (1)

j_random_hacker
j_random_hacker

Reputation: 51226

I think I better understand what you want to do now. My solution below assumes nothing about MD5 computation, but involves a tradeoff between IO and storing a large number of MD5 hashes. Instead of computing the simple MD5 hash of the entire dataset, it computes a different MD5 hash that nevertheless should have the same important property: that any change to any element (drastically) changes it.

  1. At the outset, decide on a block size b such that
    • you can afford to read b values from disk (or whatever IO you're talking about) per change of element, and
    • you can afford to keep 2n/b MD5 hashes in memory.
  2. Create a binary tree of MD5 hashes. Each leaf in this tree will be the MD5 hash of a size-b block. Each internal node is the MD5 hash of its two children. We will use the hash of the root of this tree as "the" MD5 hash.
  3. When element i changes, read in the b elements in block RoundDown(i/b), compute the new MD5 hash for this, and then propagate the changes up the tree (this will take at most log2(n) steps).

Upvotes: 1

Related Questions