david furst
david furst

Reputation: 359

dd: reading binary file as blocks of size N returned less data than N

i need to process large binary files in segments. in concept this would be similar to split, but instead of writing each segment to a file, i need to take that segment and send it as the input of another process. i thought i could use dd to read/write the file in chunks, but the results aren't at all what i expected. for example, if i try :

dd if=some_big_file bs=1M |
    while : ; do
        dd bs=1M count=1 | processor
    done

... the output sizes are actually 131,072 bytes and not 1,048,576.

could anyone tell me why i'm not seeing output blocked to 1M chunks and how i could better accomplish what i'm trying to do ?

thanks.

Upvotes: 1

Views: 668

Answers (2)

Socowi
Socowi

Reputation: 27370

First of all, you don't need the first dd. A cat file | while or done < file would do the trick as well.

dd bs=1M count=1 might return less than 1M, see When is dd suitable for copying data? (or, when are read() and write() partial)

Instead of dd count=… use head with the (non-posix) option -c ….

file=some_big_file
(( m = 1024 ** 2 ))
(( blocks = ($(stat -c %s "$file") + m - 1) / m ))
for ((i=0; i<blocks; ++i)); do
  head -c "$m" | processor
done < "$file"

Or posix conform but very inefficient

(( octM = 4 * 1024 * 1024 ))
someCommand | od -v -to1 -An | tr -d \\n | tr ' ' '\\' |
while IFS= read -rN $octM block; do
  printf %b "$block" | processor
done

Upvotes: 3

pynexj
pynexj

Reputation: 20797

According to dd's manual:

  • bs=bytes

    [...] if no data-transforming conv option is specified, input is copied to the output as soon as it's read, even if it is smaller than the block size.

So try with dd iflag=fullblock:

  • fullblock

    Accumulate full blocks from input. The read system call may return early if a full block is not available. When that happens, continue calling read to fill the remainder of the block. This flag can be used only with iflag. This flag is useful with pipes for example as they may return short reads. In that case, this flag is needed to ensure that a count= argument is interpreted as a block count rather than a count of read operations.

Upvotes: 3

Related Questions