Ulrik
Ulrik

Reputation: 1141

Use sed to extract ascii hex string from a file

I have a file that looks like this:

$ some random
$ text
00ab2c3f03$ and more
random text
1a2bf04$ more text
blah blah

and the code that looks like this:

sed -ne 's/\(.*\)$ and.*/\1/p'  "file.txt" > "output1.txt"
sed -ne 's/\(.*\)$ more.*/\1/p' "file.txt" > "output2.txt"

That gives me this 00ab2c3f03 and this 1a2bf04

So it extracts anything from the beginning of the line to the shell prompt and stores it in the file, twice for two different instances.

The problem is that the file sometimes looks like this:

/dir # some random
/dir # text
00ab2c3f03/dir # and more
random text
345fabd0067234234/dir # more text
blah blah

And I want to make an universal extractor that either:

But I'm not so good with sed to actually think of an easy solution by myself...

Upvotes: 1

Views: 1736

Answers (2)

ooga
ooga

Reputation: 15501

This seems better to me:

sed -nr 's#([[:xdigit:]]+)[$/].*#\1#p' file

Upvotes: 0

Avinash Raj
Avinash Raj

Reputation: 174696

I think you want the output like this,

$ cat file
$ some random
$ text
00ab2c3f03$ and more
random text
1a2bf04$ more text
blah blah
/dir # some random
/dir # text
00ab2c3f03/dir # and more
random text
345fabd0067234234/dir # more text
blah blah

$ sed -ne 's/\([a-f0-9]*\).* and more.*/\1/p' file
00ab2c3f03
00ab2c3f03

$ sed -ne 's/\([a-f0-9]*\).* more text.*/\1/p' file
1a2bf04
345fabd0067234234

You could try the below GNU sed command also. Because / present in your input, i changed the sed delimiter to ~,

$ sed -nr 's~([a-f0-9]*)\/*\$*.* and more.*~\1~p' file
00ab2c3f03
00ab2c3f03

$ sed -nr 's~([a-f0-9]*)\/*\$*.* more text.*~\1~p' file
1a2bf04
345fabd0067234234

Explanation:

  • ([a-f0-9]*) - Captures all the hexdigits and stored it into a group.

  • OP said there may be chance of / or $ symbol present just after the hex digits so the regex should be \/*\$*(/ zero or more times, $ zero or more times) after capturing group.

  • First command only works on the lines which contains the strings and more.

  • And the second one only works on the lines which contain more text because op want the two outputs in two different files.

Upvotes: 1

Related Questions