michaeluskov
michaeluskov

Reputation: 1828

Read a file by bytes in BASH

I need to read first byte of file I specified, then second byte,third and so on. How could I do it on BASH? P.S I need to get HEX of this bytes

Upvotes: 26

Views: 66849

Answers (7)

F. Hauri  - Give Up GitHub
F. Hauri - Give Up GitHub

Reputation: 70772

Full rewrite: september 2019!

A lot shorter and simplier than previous versions! (Something faster, but not so much)

Yes , can read and write binary:

Syntax:

LANG=C IFS= read -r -d '' -n 1 foo

will populate $foo with 1 binary byte. Unfortunately, as bash strings cannot hold null bytes ($\0), reading one byte once is required.

If read command success and $foo is empty, then read byte is NULL. Else, $foo will hold THE byte read in binary form.

Then for the value of byte read, ( I've missed this in man bash, have a look at 2016 post, at bottom of this ;b) :

printf -v value %d \`$byte
 printf [-v var] format [arguments]
 ...
     Arguments to non-string format specifiers are treated as C constants,
     except that ..., and if  the leading character is a  single or double
     quote, the value is the ASCII value of the following character.

So:

read8() {
    local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d \'$_r8_car
}

Will populate submitted variable name (default to $OUTBIN) with decimal ascii value of first byte from STDIN

read16() {
    local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8 _r16_lb &&
    read8 _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb ))
}

Will populate submitted variable name (default to $OUTBIN) with decimal value of first 16 bits word from STDIN...

Of course, for switching Endianness, you have to switch:

    read8 _r16_hb &&
    read8 _r16_lb

And so on:

# Usage:
#       read[8|16|32|64] [varname] < binaryStdInput

read8() {  local _r8_var=${1:-OUTBIN} _r8_car LANG=C IFS=
    read -r -d '' -n 1 _r8_car
    printf -v $_r8_var %d "'"$_r8_car ;}
read16() { local _r16_var=${1:-OUTBIN} _r16_lb _r16_hb
    read8  _r16_lb && read8  _r16_hb
    printf -v $_r16_var %d $(( _r16_hb<<8 | _r16_lb )) ;}
read32() { local _r32_var=${1:-OUTBIN} _r32_lw _r32_hw
    read16 _r32_lw && read16 _r32_hw
    printf -v $_r32_var %d $(( _r32_hw<<16| _r32_lw )) ;}
read64() { local _r64_var=${1:-OUTBIN} _r64_ll _r64_hl
    read32 _r64_ll && read32 _r64_hl
    printf -v $_r64_var %d $(( _r64_hl<<32| _r64_ll )) ;}

Sample playing with GPT patitions tables.

So you could source this, then if your /dev/sda is gpt partitioned,

read totsize < <(blockdev --getsz /dev/sda)
read64 gptbackup < <(dd if=/dev/sda bs=8 skip=68 count=1 2>/dev/null)
echo $((totsize-gptbackup))
1

Answer should be 1 (1st GPT is at sector 1, one sector is 512 bytes. GPT Backup location is at byte 32. With bs=8 512 -> 64 + 32 -> 4 = 544 -> 68 blocks to skip, GPT Backup is located a end of disk (disk size - 1 block.)... See GUID Partition Table at Wikipedia).

Then

read64 gptbackup2 < <(
   dd if=/dev/sda bs=8 skip=$((4+gptbackup*64)) count=1 2>/dev/null)
echo $gptbackup2 
1

Answer should be 1 (2nd GPT table, located at end of disk, hold location of 1st GPT table, wich is located at sector 1)

Quick small write function...

write () { 
    local i=$((${2:-64}/8)) o= v r
    r=$((i-1))
    for ((;i--;)) {
        printf -vv '\%03o' $(( ($1>>8*(0${3+-1}?i:r-i))&255 ))
        o+=$v
    }
    printf "$o"
}

This function default to 64 bits, little endian.

Usage: write <integer> [bits:64|32|16|8] [switchto big endian]
  • With two parameter, second parameter must be one of 8, 16, 32 or 64, to be bit length of generated output.
  • With any dummy 3th parameter, (even empty string), function will switch to big endian.

.

read64 foo < <(write -12345);echo $foo
-12345

...

First post 2015...

Upgrade for adding specific bash version (with bashisms)

With new version of printf built-in, you could do a lot without having to fork ($(...)) making so your script a lot faster.

First let see (by using seq and sed) how to parse hd output:

echo ;sed <(seq -f %02g 0 $(( COLUMNS-1 )) ) -ne '
    /0$/{s/^\(.*\)0$/\o0337\o033[A\1\o03380/;H;};
    /[1-9]$/{s/^.*\(.\)/\1/;H};
    ${x;s/\n//g;p}';hd < <(echo Hello good world!)
0         1         2         3         4         5         6         7
012345678901234567890123456789012345678901234567890123456789012345678901234567
00000000  48 65 6c 6c 6f 20 67 6f  6f 64 20 77 6f 72 6c 64  |Hello good world|
00000010  21 0a                                             |!.|
00000012

Were hexadecimal part begin at col 10 and end at col 56, spaced by 3 chars and having one extra space at col 34.

So parsing this could by done by:

while read line ;do
    for x in ${line:10:48};do
        printf -v x \\%o 0x$x
        printf $x
      done
  done < <( ls -l --color | hd )

Old original post

Edit 2 for Hexadecimal, you could use hd

echo Hello world | hd
00000000  48 65 6c 6c 6f 20 77 6f  72 6c 64 0a              |Hello world.|

or od

echo Hello world | od -t x1 -t c
0000000  48  65  6c  6c  6f  20  77  6f  72  6c  64  0a
          H   e   l   l   o       w   o   r   l   d  \n

shortly

while IFS= read -r -n1 car;do [ "$car" ] && echo -n "$car" || echo ; done

try them:

while IFS= read -rn1 c;do [ "$c" ]&&echo -n "$c"||echo;done < <(ls -l --color)

Explain:

while IFS= read -rn1 car  # unset InputFieldSeparator so read every chars
    do [ "$car" ] &&      # Test if there is ``something''?
        echo -n "$car" || # then echo them
        echo              # Else, there is an end-of-line, so print one
  done

Edit; Question was edited: need hex values!?

od -An -t x1 | while read line;do for char in $line;do echo $char;done ;done

Demo:

od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
    while read line;do                    # Read line of HEX pairs
        for char in $line;do              # For each pair
            printf "\x$char"              # Print translate HEX to binary
      done
  done

Demo 2: We have both hex and binary

od -An -t x1 < <(ls -l --color ) |        # Translate binary to 1 byte hex 
    while read line;do                    # Read line of HEX pairs
        for char in $line;do              # For each pair
            bin="$(printf "\x$char")"     # translate HEX to binary
            dec=$(printf "%d" 0x$char)    # translate to decimal
            [ $dec -lt 32  ] ||           # if caracter not printable
            ( [ $dec -gt 128 ] &&         # change bin to a single dot.
              [ $dec -lt 160 ] ) && bin="."
            str="$str$bin" 
            echo -n $char \               # Print HEX value and a space
            ((i++))                       # count printed values
            if [ $i -gt 15 ] ;then
                i=0
                echo "  -  $str"
                str=""
              fi
      done
  done

New post on september 2016:

This could be usefull on very specific cases, ( I've used them to manualy copy GPT partitions between two disk, at low level, without having /usr mounted...)

Yes, bash could read binary!

... but only one byte, by one... (because `char(0)' couldn't be correctly read, the only way of reading them correctly is to consider end-of-file, where if no caracter is read and end of file not reached, then character read is a char(0)).

This is more a proof of concept than a relly usefull tool: there is a pure version of hd (hexdump).

This use recent bashisms, under bash v4.3 or higher.

#!/bin/bash

printf -v ascii \\%o {32..126}
printf -v ascii "$ascii"

printf -v cntrl %-20sE abtnvfr

values=()
todisplay=
address=0
printf -v fmt8 %8s
fmt8=${fmt8// / %02x}

while LANG=C IFS= read -r -d '' -n 1 char ;do
    if [ "$char" ] ;then
        printf -v char "%q" "$char"
        ((${#char}==1)) && todisplay+=$char || todisplay+=.
        case ${#char} in
         1|2 ) char=${ascii%$char*};values+=($((${#char}+32)));;
           7 ) char=${char#*\'\\};values+=($((8#${char%\'})));;
           5 ) char=${char#*\'\\};char=${cntrl%${char%\'}*};
                values+=($((${#char}+7)));;
           * ) echo >&2 ERROR: $char;;
        esac
      else
        values+=(0)
      fi

    if [ ${#values[@]} -gt 15 ] ;then
        printf "%08x $fmt8 $fmt8  |%s|\n" $address ${values[@]} "$todisplay"
        ((address+=16))
        values=() todisplay=
      fi
  done

if [ "$values" ] ;then
        ((${#values[@]}>8))&&fmt="$fmt8 ${fmt8:0:(${#values[@]}%8)*5}"||
            fmt="${fmt8:0:${#values[@]}*5}"
        printf "%08x $fmt%$((
                50-${#values[@]}*3-(${#values[@]}>8?1:0)
            ))s |%s|\n" $address ${values[@]} ''""'' "$todisplay"
fi
printf "%08x (%d chars read.)\n" $((address+${#values[@]})){,}

You could try/use this, but don't try to compare performances!

time hd < <(seq 1 10000|gzip)|wc
   1415   25480  111711
real    0m0.020s
user    0m0.008s
sys     0m0.000s

time ./hex.sh < <(seq 1 10000|gzip)|wc
   1415   25452  111669
real    0m2.636s
user    0m2.496s
sys     0m0.048s

same job: 20ms for hd vs 2000ms for my bash script.

... but if you wanna read 4 bytes in a file header or even a sector address in an hard drive, this could do the job...

Upvotes: 44

syntaxerror
syntaxerror

Reputation: 701

Although I rather wanted to expand Perleone's own post (as it was his basic concept!), my edit was rejected after all, and I was kindly adviced that this should be posted as a separate answer. Fair enough, so I will do that.

Considerations in short for the improvements on Perleone's original script:

  • seq would be totally overkill here. A simple while loop with a used as a (likewise simple) counter variable will do the job just fine (and much quicker too)
  • The max value, $(cat $1 | wc -c) must be assigned to a variable, otherwise it will be recalculated every time and make this alternate script run even slower than the one it was derived from.
  • There's no need to waste a function on a simple usage info line. However, it is necessary to know about the (mandatory) curly braces around two commands, for without the { }, the exit 1 command will be executed in either case, and the script interpreter will never make it to the loop. (Last note: ( ) will work too, but not in the same way! Parentheses will spawn a subshell, whilst curly braces will execute commands inside them in the current shell.)
#!/bin/bash

test -s "$1" || { echo "Need a file with size greater than 0!"; exit 1; }

a=0
max=$(cat $1 | wc -c)
while [[ $((++a)) -lt $max ]]; do
  cat $1 | head -c$a | tail -c1 | \
  xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

Upvotes: 0

Willian Mainieri
Willian Mainieri

Reputation: 39

I have a suggestion to give, but would like a feedback from everybody and manly a personal advice from syntaxerror's user.

I don't know much about bash but I thought maybe it would be better to have "cat $1" stored in a variable.. but the problem is that echo command will also bring a small overhead right?

test -s "$1" || (echo "Need a file with size greater than 0!"; exit 1)
a=0
rfile=$(cat $1)
max=$(echo $rfile | wc -c)
while [[ $((++a)) -lt $max ]]; do
  echo $rfile | head -c$a | tail -c1 | \
  xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

in my opinion it would have a better performance but i haven't perf'tested..

Upvotes: 0

Perleone
Perleone

Reputation: 4038

Yet another solution, using head, tail and printf:

for a in $( seq $( cat file.txt | wc -c ) ) ; do cat file.txt | head -c$a | tail -c1 | xargs -0 -I{} printf '%s %0X\n' {} "'{}" ; done

More readable:

#!/bin/bash

function usage() {
    echo "Need file with size > 0"
    exit 1
}

test -s "$1" || usage

for a in $( seq $( cat $1 | wc -c ) )
do
    cat $1 | head -c$a | tail -c1 | \
    xargs -0 -I{} printf '%c %#02x\n' {} "'{}"
done

Upvotes: 2

Grijesh Chauhan
Grijesh Chauhan

Reputation: 58271

using read a single char can be read at a time as follows:

read -n 1 c
echo $c   

[ANSWER]

Try this:

#!/bin/bash
# data file
INPUT=/path/to/input.txt

# while loop
while IFS= read -r -n1 char
do
        # display one character at a time
    echo  "$char"
done < "$INPUT"

From this link


Second method, Using awk, loop through char by char

awk '{for(i=1;i<=length;i++) print substr($0, i, 1)}' /home/cscape/Desktop/table2.sql


third way,

$ fold -1 /home/cscape/Desktop/table.sql  | awk '{print $0}'

EDIT: To print each char as HEX number:

Suppose I have a file name file :

$ cat file
123A3445F 

I have written a awk script (named x.awk) to that read char by char from file and print into HEX :

$ cat x.awk
#!/bin/awk -f

BEGIN    { _ord_init() }

function _ord_init(    low, high, i, t)
{
    low = sprintf("%c", 7) # BEL is ascii 7
    if (low == "\a") {    # regular ascii
        low = 0
        high = 127
    } else if (sprintf("%c", 128 + 7) == "\a") {
        # ascii, mark parity
        low = 128
        high = 255
    } else {        # ebcdic(!)
        low = 0
        high = 255
    }

    for (i = low; i <= high; i++) {
        t = sprintf("%c", i)
        _ord_[t] = i
    }
}
function ord(str,    c)
{
    # only first character is of interest
    c = substr(str, 1, 1)
    return _ord_[c]
}

function chr(c)
{
    # force c to be numeric by adding 0
    return sprintf("%c", c + 0)
}

{ x=$0; printf("%s , %x\n",$0, ord(x) )} 

To write this script I used awk-documentation
Now, You can use this awk script for your work as follows:

$ fold -1 /home/cscape/Desktop/file  | awk -f x.awk
1 , 31
2 , 32
3 , 33
A , 41
3 , 33
4 , 34
4 , 34
5 , 35
F , 46

NOTE: A value is 41 in HEX decimal. To print in decimal change %x to %d in last line of script x.awk.

Give it a Try!!

Upvotes: 4

anishsane
anishsane

Reputation: 20980

Did you try xxd? It gives hex dump directly, as you want..

For your case, the command would be:

xxd -c 1 /path/to/input_file | while read offset hex char; do
  #Do something with $hex
done

Note: extract the char from hex, rather than while read line. This is required because read will not capture white space properly.

Upvotes: 11

yasu
yasu

Reputation: 1364

use read with -n option.

while read -n 1 ch; do
  echo $ch
done < moemoe.txt

Upvotes: 0

Related Questions