Using sed to replace part of a text based on a regular expression result

Question

I need to read a log file and look for the text any_number_here and any_number_hereDany_number_here and replace those numbers so it looks like this:

*************5683 and *************5683D00000000000000000000

This is an example of the log line:

2016/02/01 04:20:21 [18f][00000000000001526][0][00000000000000] Some text here: [size: 000 communication_format: ISO0000 data: "444444444444456832017-05244444444444445683D00000000000000000000"]

Notice the D separating values on .

This is my first time trying sed and I could get the value inside the tag but I don't know how to work on that value and replace part of it with *

I have only the expression to get what's inside the tag:

sed -e 's/$[[:digit:]]*$/ANOTHER SUBSTITUTION HERE?/' test.log

UPDATE Now I have this solution, which is the closest I got to what I need:

sed -e 's/[[:digit:]]\{13\}/(&)/g' -e 's/(.*)/*************/g' pan.txt

The problem with that is that it is replacing any () it finds with ************* and there are several () in the log file.

UPDATE 2

I think I found the solution:

sed -e 's/[[:digit:]]\{13\}/(&)/g' -e 's/(.*)/*************/g' pan.txt

This is working only for the KEY tag.

Benjamin W. · Accepted Answer

As a one-liner:

$ sed -r ':a;s|(\**)[0-9]([0-9]*[0-9]{4})|\1*\2|;s|(\**)[0-9]([0-9]*[0-9]{4}D[^<]*)|\1*\2|;ta' <<< "$var"
2016/02/01 04:20:21 [18f][00000000000001526][0][00000000000000] Some text here: [size: 000 communication_format: ISO0000 data: "*************56832017-052*************5683D00000000000000000000"]

This handles any number of digits and always just leaves the last four. To allow for this flexibility, the overall structure of the command is as follows:

:label   # Label to branch to
s///     # Substitute one digit for 
s///     # Substitute one digit for 
t label  # If a substitution took place, branch back to 'label'

So as long any of the substitutions did something, we loop back and try to replace another digit using the t command (conditional branching).

Now, for the substitutions, they look as follows:

s|(\**)[0-9]([0-9]*[0-9]{4})|\1*\2|

This uses two capture groups: one that contains and however many * are after it. Then comes a single, uncaptured digit (which we'll replace in this loop), and then the second capture group consisting of [0-9]*[0-9]{4}, i.e., any number of digits ending in four digits and . The substitution simply replaces the uncaptured digit with an asterisk.

Notice that I use extended regular expressions (-r option) so I don't have to escape (), and the pipe | as delimiter so I don't have to escape /.

The second substitution is almost the same:

s|(\**)[0-9]([0-9]*[0-9]{4}D[^<]*)|\1*\2|

The only difference is that it looks for KEYVAL instead of KEY, and between the closing tag and the four digits to be kept there is D[^<]*, i.e., a D followed by any number of characters other than the opening angle bracket.

Alternative solution without looping

Definitely no one-liner material, but potentially faster for huge log files:

h        # Copy pattern space to hold space

# Remove everything except digits we want to replace from pattern space
s|.*(.*)[0-9]{4}.*|\1|

s/./*/g  # Replace digits with '*'
G        # Append hold space to pattern space

# Rearrange pattern space
s|(.*)
(.*).*([0-9]{4}.*)$|\2\1\3|

# And the the same for the KEYVAL part
h
s|.*(.*)[0-9]{4}D.*.*|\1|
s/./*/g
G
s|(.*)
(.*).*([0-9]{4}D.*.*)$|\2\1\3|

This has to be stored in a separate file (some seds don't like the comments, so they can be removed) and then called like this:

$ sed -rf sedscr.sed <<< "$var"
2016/02/01 04:20:21 [18f][00000000000001526][0][00000000000000] Some text here: [size: 000 communication_format: ISO0000 data: "*************56832017-052*************5683D00000000000000000000"]

Using sed to replace part of a text based on a regular expression result

Answers (2)

Alternative solution without looping

Related Questions