b degnan
b degnan

Reputation: 692

BASH: Padding a series of HEX values based on the longest string

I have this odd condition where I've been given a series of HEX values that represent binary data. The interesting thing is that they are occasionally different lengths, such as:

40000001AA
0000000100
A0000001
000001
20000001B0
40040001B0

I would like to append 0's on the end to make them all the same length based on the longest entry. So, in the example above I have four entires that are 10 characters long, terminated by '\n', and a few short ones (in the actual data, I 200k of entries with about 1k short ones). What I would like to do figure out the longest string in the file, and then go through and pad the short ones; however, I haven't been able to figure it out. Any suggestions would be appreciated.

Upvotes: 1

Views: 303

Answers (4)

Ed Morton
Ed Morton

Reputation: 203324

In general to zero-pad a string from either or both sides is (using 5 as the desired field width for example):

$ echo '17' | awk '{printf "%0*s\n", 5, $0}'
00017

$ echo '17' | awk '{printf "%s%0*s\n", $0, 5-length(), ""}'
17000

$ echo '17' | awk '{w=int((5+length())/2); printf "%0*s%0*s\n", w, $0, 5-w, ""}'
01700

$ echo '17' | awk '{w=int((5+length()+1)/2); printf "%0*s%0*s\n", w, $0, 5-w, ""}'
00170

so for your example:

$ awk '{cur=length()} NR==FNR{max=(cur>max?cur:max);next} {printf "%s%0*s\n", $0, max-cur, ""}' file file
40000001AA
0000000100
A000000100
0000010000
20000001B0
40040001B0

Upvotes: 1

anubhava
anubhava

Reputation: 785058

Using standard two-pass awk:

awk 'NR==FNR{if (len < length()) len=length(); next}
     {s = sprintf("%-*s", len, $0); gsub(/ /, "0", s); print s}' file file

40000001AA
0000000100
A000000100
0000010000
20000001B0
40040001B0

Or using gnu wc with awk:

awk -v len="$(wc -L < file)" '
   {s = sprintf("%-*s", len, $0); gsub(/ /, "0", s); print s}' file

40000001AA
0000000100
A000000100
0000010000
20000001B0
40040001B0

Upvotes: 3

Arkadiusz Drabczyk
Arkadiusz Drabczyk

Reputation: 12383

As you use Bash there is a big chance that you also use other GNU tools. In such case wc can easily tell you the the length of the greatest line in the file using -L option. Example:

$ wc -L /tmp/HEX
10 /tmp/HEX

Padding can be done like this:

$ while read i; do echo $(echo "$i"0000000000 | head -c 10); done < /tmp/HEX
40000001AA
0000000100
A000000100
0000010000
20000001B0
40040001B0

A one-liner:

while read i; do eval printf "$i%.s0" {1..$(wc -L /tmp/HEX | cut -d ' ' -f1)} | head -c $(wc -L /tmp/HEX | cut -d ' ' -f1); echo; done < /tmp/HEX

Upvotes: 2

oybek
oybek

Reputation: 650

Let's suppose you have this values in file:

file=/tmp/hex.txt

Find out length of longest number:

longest=$(wc -L < $file)

Now for each number in file justify it with zeroes

while read number; do
    printf "%-${longest}s\n" $number | sed 's/ /0/g'
done < $file

This what will print script to stdout:

40000001AA
0000000100
A000000100
0000010000
20000001B0
40040001B0

Upvotes: 1

Related Questions