Ratatouille
Ratatouille

Reputation: 1492

How to divide block in piece when they overlap

Some input I'm looking to build a simple minimal bittorrent client. I reading the protocol spec for a 2-3 days now.

here what my understanding on it thus far . Assuming that torrent has a piece length of 26000 bytes and according to non official spec block size is 16384. Something like this.

enter image description here Now upon request of a block of piece message would look like this

piece 0 
block offset 0
block length 16484

So far so good.

Now, for next block which overlap in piece 0 and 1 what should the request look like

piece 0  ## since the start of byte is in piece 0 use piece 0 instead of piece 1
block offset 16384
block length 16384

Now on the receiving end I need to recreate the piece of 26000 bytes so that I can compare that with pieces (hash) to match the piece for correctness.

Is my understanding correct ?

Also I'm let suppose the piece verification failed and may be it because of the first block i.e Block 0 (which is faulty or corrupt) then I should requeue Block 0 and Block 1 (which was valid btw and also a part of piece 1) to retransmit again.

And now suddenly the piece and block distribution become a bit complex then what I assume it be. and I hoping there is a simpler solution to this.

Any thought

Upvotes: 4

Views: 758

Answers (3)

Encombe
Encombe

Reputation: 2089

Will use the more distinct term 'chunk' instead of the ambiguous 'block'.


  • A torrent is divided into pieces.

  • A piece is divided into chunks.

  • A chunk is cut from one piece.

A torrent is divided into pieces when it's created. With the Request message, a piece is in turn further divided into chunks by the downloading BitTorrent client.
How the client cut the chunks out from a piece doesn't matter, as long as no single chunk is larger than 16 KB (16384 bytes).
The simplest and most rational way to divide a piece, is to do it in as few chunks as possible, by dividing it in 16 KB chunks and let the last chunk of the piece be smaller if necessary.


The Request message format: <len=0013><id=6><Piece_index><Chunk_offset><Chunk_length>

  • <Piece_index > integer specifying the zero-based piece index
  • <Chunk_offset> integer specifying the zero-based byte offset within the piece
  • <Chunk_length> integer specifying the requested number of bytes



When requesting a chunk:

  • the whole chunk must be within the piece specified by the Piece_index,
    ie Chunk_offset+Chunk_length must be less or equal to the size of that specific piece*.
  • the Chunk_length can not be larger than 16 KB (16384 bytes) and must be at least 1 byte
  • the peer that get the request must have the piece specified by the Piece_index

If any of the conditions is not met, the peer receiving the request will close the connection.

* For all pieces except the very last one that is the 'piece length' defined in the info-dictionary.
The size of the last piece can by calculated as:
size_last_piece = size_of_torrent - (number_of_pieces - 1) * 'piece length'

Upvotes: 2

the8472
the8472

Reputation: 43125

The maximum block size commonly accepted by clients is 16KiB. Clients are free to make smaller requests.

Pieces are commonly a multiple of 16KiB, but the current spec does not require it (this changes with BEP52) and some people use prime numbers or similar things for fun, so they do exist in the wild.

Blocks only exist in the sense that you need multiple requests to get a complete piece that is larger than 16KiB. In other words, blocks are the same thing as whatever you decide to request. You could request 500 bytes, then 1017 bytes and then 13016 bytes, ... until you got a complete piece. They are arbitrary subdivisions within a piece - there is no overlap - that you need to keep track of between the start of downloading a piece and finishing the piece.

They do not participate in hashing, they do not factor into the HAVE or BITFIELD messages. Only REQUEST, PIECE, CANCEL and REJECT messages concern themselves with blocks. And instead of blocks you could also call them sub-piece offset-length tuples or something to that effect.

Upvotes: -1

Andrei Tomashpolskiy
Andrei Tomashpolskiy

Reputation: 170

Last block in a piece may be smaller than the transfer block size. I.e. 26000 - 16384 = 9616 bytes should be requested in the second PIECE message. As soon as all 26000 bytes have been received, SHA-1 hash should be calculated and compared with the corresponding checksum from the pieces section of metainfo dictionary. If the checksum does not match, you have no means to know which block contained invalid data and should re-download all blocks from this piece.

My advice would be not to depend on some particular partitioning of the piece, because: 1) peers may use a different transfer block size when requesting data 2) SHA-1 algorithm is block-based, and the digester better use a bigger block size (otherwise calculations will take more time)

A proper abstraction for a piece would be a generic data range with the following methods:

  • read(from:int, length:int):byte[]
  • write(offset:int, block:byte[]):()

Then you'll be able to read/write arbitrary subranges of data.

Upvotes: -1

Related Questions