Reputation: 359
i need to process large binary files in segments. in concept this would be similar to split, but instead of writing each segment to a file, i need to take that segment and send it as the input of another process. i thought i could use dd
to read/write the file in chunks, but the results aren't at all what i expected. for example, if i try :
dd if=some_big_file bs=1M |
while : ; do
dd bs=1M count=1 | processor
done
... the output sizes are actually 131,072
bytes and not 1,048,576
.
could anyone tell me why i'm not seeing output blocked to 1M
chunks and how i could better accomplish what i'm trying to do ?
thanks.
Upvotes: 1
Views: 668
Reputation: 27370
First of all, you don't need the first dd
. A cat file | while
or done < file
would do the trick as well.
dd bs=1M count=1
might return less than 1M, see
When is dd suitable for copying data? (or, when are read() and write() partial)
Instead of dd count=…
use head
with the (non-posix) option -c …
.
file=some_big_file
(( m = 1024 ** 2 ))
(( blocks = ($(stat -c %s "$file") + m - 1) / m ))
for ((i=0; i<blocks; ++i)); do
head -c "$m" | processor
done < "$file"
Or posix conform but very inefficient
(( octM = 4 * 1024 * 1024 ))
someCommand | od -v -to1 -An | tr -d \\n | tr ' ' '\\' |
while IFS= read -rN $octM block; do
printf %b "$block" | processor
done
Upvotes: 3
Reputation: 20797
According to dd's manual:
bs=bytes
[...] if no data-transforming
conv
option is specified, input is copied to the output as soon as it's read, even if it is smaller than the block size.
So try with dd iflag=fullblock
:
fullblock
Accumulate full blocks from input. The
read
system call may return early if a full block is not available. When that happens, continue callingread
to fill the remainder of the block. This flag can be used only withiflag
. This flag is useful with pipes for example as they may return short reads. In that case, this flag is needed to ensure that acount=
argument is interpreted as a block count rather than a count of read operations.
Upvotes: 3