savager1
savager1

Reputation: 71

sed capture everything until pattern using variable as string

I have the a bunch of text that is represented similar to xml.

<File>
    <abc1>
        <Hex>
            <item>
                <data>AB CD 34 43</data>
            </item>
         </Hex>
    </abc1>
</File>

I use an application (ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter /lib64/ld-linux-x86-64.so.2, for GNU/Linux 2.6.32, not stripped) which takes a file as an argument and decodes it.

I can successfully run:

./application /tmp/file and it decodes correctly

I can also capture everything up to and including the line that contains the hex below (this works well, but of course, cannot enter in data everything into file and then run it - need to do it with a variable dynamically:

./application /tmp/file | sed '/AB CD 34 43/q'

But what I am unable to do is pass a variable instead of the hex string

./application /tmp/file | sed '/`echo -n "$value"`/q'

I don't mind what I use, sed/awk/grep.

My main aim is to extract everything up to the hex address then run another command to copy the same in the other direction, that I am left with mostly one complete "frame" if you will. Then I can just tail it for the frame size so it contains just one complete frame.

Upvotes: 0

Views: 534

Answers (2)

dan
dan

Reputation: 5251

Your command substitution (backticks) is inside hard (single) quotes ('), so the shell won't expand it. You don't need echo, either, just the variable itself. You must still use soft (double) quotes ("), for the variable to be expanded.

hex='AB CD 34 43'

./application /tmp/file | sed "/$hex/q"

When using shell variables in sed addresses, just remember that it's a regular expression, not a string, and sed doesn't have an option like grep's -F (nor would such an option help with sed's //{}; etc - the only option is to escape, beforehand). For hex it's fine, as it's only alphanumeric and space characters.

Also, as you said, sed will print up to the line containing the pattern. So any text after the pattern, on the same line, will be included. Perhaps this is fine for your requirements, but I'll include a few methods of printing up to (and including) the pattern exactly, but no further.

sed

sed -E "/$hex/{s/($hex)(.*)/\1/;q}"

awk

awk -v RS="$hex" '{print($0 RS); exit}'

Note: POSIX says RS should be a single character, but many implementations accept a regex*

shell

decoded=$(./application /tmp/file)
truncated=${decoded%%"$hex"*}
echo "$truncated"

This copies all the data to memory first, and can be slow for very large data, but fast for small data, b/c pure shell. Quoting $hex treats it as a string instead of a pattern.

* From https://www.gnu.org/software/gawk/manual/html_node/gawk-split-records.html

RS as a regular expression [is a gawk extension]

mawk has allowed RS to be a regexp for decades. As of October, 2019, BWK awk also supports it.

If pattern $hex occurs more than once, these all print up to the first occurrence of $hex (including it), but can be easily modified to print to up to the last occurrence.

Finally, when doing stuff like this, remember that the hexdump utility can use printf like format strings to control output format. Eg hexdump -ve '1/1 "%02x "' mybinfile; echo to dump a space delimited list of bytes in mybinfile, in hex.

Upvotes: 0

KamilCuk
KamilCuk

Reputation: 141533

./application /tmp/file | sed "/$value/q"

Research quoting in shell and difference between single and double quotes.

For parsing XML files use XML aware tools, like xmlstarlet. Do not parse XML with regex.

Do not use ` backticks - use $(...) instead.

Upvotes: 1

Related Questions