Reputation:
I need to read a log file and look for the text <KEY>any_number_here</KEY>
and <KEYVAL>any_number_hereDany_number_here</KEYVAL>
and replace those numbers so it looks like this:
<KEY>*************5683</KEY>
and <KEYVAL>*************5683D00000000000000000000</KEYVAL>
This is an example of the log line:
2016/02/01 04:20:21 [18f][00000000000001526][0][00000000000000] Some text here: [size: 000 communication_format: ISO0000 data: "<Document xmlns='bla'><KEY>44444444444445683</KEY><DATE>2017-05</DATE><DATA>2</DATA><KEYVAL>44444444444445683D00000000000000000000</KEYVAL>"]
Notice the D separating values on <KEYVAL>
.
This is my first time trying sed
and I could get the value inside the <KEY>
tag but I don't know how to work on that value and replace part of it with *
I have only the expression to get what's inside the <KEY>
tag:
sed -e 's/<KEY>\([[:digit:]]*\)<\/KEY>/ANOTHER SUBSTITUTION HERE?/' test.log
UPDATE Now I have this solution, which is the closest I got to what I need:
sed -e 's/<KEY>[[:digit:]]\{13\}/(&)/g' -e 's/(.*)/<KEY>*************/g' pan.txt
The problem with that is that it is replacing any ()
it finds with <KEY>*************
and there are several ()
in the log file.
UPDATE 2
I think I found the solution:
sed -e 's/<KEY>[[:digit:]]\{13\}/(&)/g' -e 's/(.*)/<KEY>*************/g' pan.txt
This is working only for the KEY
tag.
Upvotes: 0
Views: 62
Reputation: 52536
As a one-liner:
$ sed -r ':a;s|(<KEY>\**)[0-9]([0-9]*[0-9]{4}</KEY>)|\1*\2|;s|(<KEYVAL>\**)[0-9]([0-9]*[0-9]{4}D[^<]*</KEYVAL>)|\1*\2|;ta' <<< "$var"
2016/02/01 04:20:21 [18f][00000000000001526][0][00000000000000] Some text here: [size: 000 communication_format: ISO0000 data: "<Document xmlns=bla><KEY>*************5683</KEY><DATE>2017-05</DATE><DATA>2</DATA><KEYVAL>*************5683D00000000000000000000</KEYVAL>"]
This handles any number of digits and always just leaves the last four. To allow for this flexibility, the overall structure of the command is as follows:
:label # Label to branch to
s/// # Substitute one digit for <KEY>
s/// # Substitute one digit for <KEYVAL>
t label # If a substitution took place, branch back to 'label'
So as long any of the substitutions did something, we loop back and try to replace another digit using the t
command (conditional branching).
Now, for the substitutions, they look as follows:
s|(<KEY>\**)[0-9]([0-9]*[0-9]{4}</KEY>)|\1*\2|
This uses two capture groups: one that contains <KEY>
and however many *
are after it. Then comes a single, uncaptured digit (which we'll replace in this loop), and then the second capture group consisting of [0-9]*[0-9]{4}</KEY>
, i.e., any number of digits ending in four digits and </KEY>
. The substitution simply replaces the uncaptured digit with an asterisk.
Notice that I use extended regular expressions (-r
option) so I don't have to escape ()
, and the pipe |
as delimiter so I don't have to escape /
.
The second substitution is almost the same:
s|(<KEYVAL>\**)[0-9]([0-9]*[0-9]{4}D[^<]*</KEYVAL>)|\1*\2|
The only difference is that it looks for KEYVAL
instead of KEY
, and between the closing tag and the four digits to be kept there is D[^<]*
, i.e., a D
followed by any number of characters other than the opening angle bracket.
Definitely no one-liner material, but potentially faster for huge log files:
h # Copy pattern space to hold space
# Remove everything except digits we want to replace from pattern space
s|.*<KEY>(.*)[0-9]{4}</KEY>.*|\1|
s/./*/g # Replace digits with '*'
G # Append hold space to pattern space
# Rearrange pattern space
s|(.*)\n(.*<KEY>).*([0-9]{4}</KEY>.*)$|\2\1\3|
# And the the same for the KEYVAL part
h
s|.*<KEYVAL>(.*)[0-9]{4}D.*</KEYVAL>.*|\1|
s/./*/g
G
s|(.*)\n(.*<KEYVAL>).*([0-9]{4}D.*</KEYVAL>.*)$|\2\1\3|
This has to be stored in a separate file (some seds don't like the comments, so they can be removed) and then called like this:
$ sed -rf sedscr.sed <<< "$var"
2016/02/01 04:20:21 [18f][00000000000001526][0][00000000000000] Some text here: [size: 000 communication_format: ISO0000 data: "<Document xmlns=bla><KEY>*************5683</KEY><DATE>2017-05</DATE><DATA>2</DATA><KEYVAL>*************5683D00000000000000000000</KEYVAL>"]
Upvotes: 2
Reputation: 8769
$cat inputfile
2016/02/01 04:20:21 [18f][00000000000001526][0][00000000000000] Some text here: [size: 000 communication_format: ISO0000 data: "<Document xmlns='bla'><KEY>44444444444445683</KEY><DATE>2017-05</DATE><DATA>2</DATA><KEYVAL>44444444444445683D00000000000000000000</KEYVAL>"]
$ egrep -o -e '<KEY>[0-9]+</KEY>' -e '<KEYVAL>[0-9]+D[0-9]+</KEYVAL>' inputfile | sed -r -e 's/^(<KEY>.*)([0-9]{4})(<\/KEY>)$/\1\n\2\3/g;' -e 's/^(<KEYVAL>.*)([0-9]{4}D[0-9]+)(<\/KEYVAL>)$/\1\n\2\3/g' | sed -e '1~2 s/[0-9]/*/g' | sed -n 'N;s/\n//g;p'
<KEY>*************5683</KEY>
<KEYVAL>*************5683D00000000000000000000</KEYVAL>
This handles any number of digits before 5683 in KEY, also it handles any number of digits before and after 5683D
in KEYVAL. Also 5683 can be can be any 4 digits.
Upvotes: 1