Chris Camacho
Chris Camacho

Reputation: 1174

How can I replace byte sequences in my data using Sed?

I have this rule in my Makefile, to replace ||| (three pipe characters; hex 7c 7c 7c) with CRLFNUL (carriage return + line feed + null; hex 0d 0a 00):

rom.hex: rom.txt  
    hexdump -C rom.txt | cut -c10-60 > rom.hex
    sed -i -e 's/  / /g' rom.hex
    sed -i -e 's/7c 7c 7c/0d 0a 00/g' rom.hex

This works some of the time - but, if the output of hexdump splits a 7c 7c 7c sequence across two lines it isn't matched by sed.

The replacement has to be the same length as the match, so as not to shift the subsequent bytes.

Upvotes: 2

Views: 3637

Answers (3)

Toby Speight
Toby Speight

Reputation: 30911

You could make the replacement first, before transforming into hex:

rom.hex: rom.txt
    sed -e 's/|||/\r\n\x00/g' $< | hexdump -v | cut -c'10-60' >$@

Note that the backslash escapes are a GNU sed extension, so this is not a completely portable solution. If you need a portable sed command, you'll need to put it in a separate file, because you can't include a NUL in a command-line argument. The literal newline must be quoted, too:

s/|||/^M\
^@/g

For clarity, the control characters above are

73 2f 7c 7c 7c 2f 0d 5c  0a 00 2f 67      |s/|||/.\../g|

Then the rule would be

rom.hex: rom.txt
    sed -f "transform.sed" $< | hexdump -v | cut -c'10-60' >$@

Upvotes: 4

mklement0
mklement0

Reputation: 439397

- Toby Speight's helpful answer elegantly bypasses the OP's problem by using GNU sed to replace data at the source, without needing to operate on a hex. representation (his portable alternative doesn't work with BSD sed, but that's only because of the NUL character in the replacement string).
- The value of this answer is in solving the OP's problem exactly as stated, notably using tr -s '\n' ' ', and in providing a relatively simple portable solution at the bottom - it is of interest from a byte-represenation / text processing perspective.
- See my other answer for a simpler solution that uses hexdump's formatting options to produce the desired output format directly.


Note:

  • The solutions below transform the byte-value representation of the input into a single line, so as to enable robust use of sed to replace values.
  • If you do want the fixed-width multi-line output that hexdump produces by default, pipe the output to ... | fmt -w48

The following command normalizes all whitespace in the output from hexdump -C:

hexdump -vC rom.txt | cut -c10-60 | tr -s '\n' ' ' > rom.hex

Note the addition of -v, which prevents loss of information.
Without -v, duplicates in adjacent repeating lines would be represented as *.

The result is:

  • a single line bookended by a leading and trailing space,

    • If you want to strip these, see the portable solution at the bottom.
  • with byte values all separated by a single space each; e.g.:
    23 21 2f 62 69 6e 2f 62 61 73 68 0a 0a 23 20 23 20 76 3d 24 5f 0a 23 20 23 20 65 63 68 6f 20 22 ....

  • Note that tr's -s ("squeeze") option, after having performed the translation (\n to in this case, i.e.), folds runs of multiple occurrences of the target character ( (space) in this case) into single-character runs.

Thus:

  • The intermediate sed command (sed -i -e 's/ /...) to normalize the line-internal spaces is no longer needed.

  • The final sed command (sed -i -e 's/7c 7c 7c/ ...) can safely use space-separated values as the search string, without worrying about where the line breaks happened to be in hexdump -C's output.

There is room for simplification:

  • A single pipeline can be used - no need to write to the file in an intermediate form and update it in place later.

    • As a side effect, because -i is no longer needed, the sed command becomes portable (POSIX-compliant); while this form will work on both Linux and BSD/OSX platforms, it is still not strictly POSIX-compliant as a whole, because hexdump is a nonstandard utility; see the bottom for a strictly POSIX-compliant solution.
  • Special make variables $<, the (first) prerequisite (rom.hex), and $@, the target (rom.txt) can be used.

  • There is no need for the -C option of hexdump, if only the byte values are needed; this allows simplification of the cut command, which, incidentally, strips the leading space from the output (and also makes tr's -s option unnecessary):

rom.hex: rom.txt  
    hexdump -v $< | cut -sd' ' -f2- | tr '\n' ' ' | sed 's/7c 7c 7c/0d 0a 00/g' > $@
  • cut -sd' ' -f2-:
    • -s means that lines not containing the delimiter (separator) specified with -d are skipped, which skips a trailing empty line (empty except for the byte-offset column) that hexdump may output.
    • -d' ' splits the input into fields using a single space as the delimiter.
    • -f2- outputs the 2nd field through the end of the line (-), effectively stripping the 1st field (the input-address offset column in hexdump's output).

To make the command fully portable, POSIX utility od can be used in lieu of the nonstandard hexdump utility.
Furthermore, an extra sed command is used to strip the leading and trailing space from the output.

rom.hex: rom.txt  
    od -t x1 -A n -v $< | tr -s '\n' ' ' | sed 's/^ //; s/ $//' | sed 's/7c 7c 7c/0d 0a 00/g' > $@
  • od -t x1 -A n -v outputs hex. (x) bytes (1) across multiple lines of fixed width, similar to hexdump, except that -A n blanks out the input-address offset column; -v ensures that all bytes are represented; without it, adjacent duplicate lines would be represented as *.
  • tr -s '\n' ' ', as above, normalizes the whitespace to produce a single, long line with byte values separated by a single space, bookended by a single leading and trailing space.
  • sed 's/^ //; s/ $//' removes the leading and trailing space.
  • The rest of the command is as before.

Upvotes: 2

mklement0
mklement0

Reputation: 439397

- See my other answer for how to solve the problem as stated or if you need a POSIX-compliant solution.
- This answer is of interest from a byte-representation formatting perspective.


Note:

  • The solutions below transform the byte-value representation of the input into a single line, so as to enable robust use of sed to replace values.
  • If you do want the fixed-width multi-line output that hexdump produces by default, pipe the output to ... | fmt -w48

The problem can be bypassed by passing formatting options to hexdump:

hexdump -ve '1/1 "%02x "'

produces the desired output format as a single line directly (there will be a single trailing space).

  • -v prevents abbreviation of repeating bytes as *
  • -e '1/1 "%02x "':
    • 1/1 specifies that the following format string be applied to 1 unit of byte size 1, i.e., each byte.
    • "%02x " is the format string to apply to each byte: a 2-digit hex number followed by a space.

To put it all together, using special make variables $<, the (first) prerequisite (rom.hex), and $@, the target (rom.txt):

rom.hex: rom.txt  
    hexdump -ve '1/1 "%02x "' $< | sed 's/7c 7c 7c/0d 0a 00/g' > $@

Alternative solution, using the (also nonstandard) xxd utility; like hexdump, however, it is available on both Linux and BSD/OSX:

rom.hex: rom.txt  
    xxd -p $< | tr -d '\n' | sed 's/../& /g; s/ $//' | sed 's/7c 7c 7c/0d 0a 00/g' > $@
  • xxd -p prints a stream of byte values without separators, broken into lines of fixed length.

  • tr -d '\n' removes the newlines from the output.

  • sed 's/../& /g; s/ $//' inserts a space after every 2 characters, then deletes the trailing space at the end of the line.


Finally, as Toby Speight points out in a [since cleaned-up] comment, you can use the GNU version of od with the nonstandard -w option:

rom.hex: rom.txt  
    od -t x1 -A n -w1 -v $< | tr -d '\n' | sed 's/7c 7c 7c/0d 0a 00/g' > $@
  • od -t x1 -A n -w1 -v outputs hex. (x) bytes (1) 1 byte at a time (-w1); -A n omits the input-address offset column; -v ensures that all bytes are represented; without it, adjacent duplicate lines would be represented as *.
  • tr -d '\n' simply removes all newlines, and since each line starts with a space, the result is a single long line with a leading space.

Upvotes: 1

Related Questions