Reputation: 93
I have a log file that has embedded xml amongst normal STDOUT in it as follows:
2015-05-06 04:07:37.386 [INFO]Process:102 - Application submitted Successfully ==== 1
<APPLICATION><FirstName>Test</FirstName><StudentSSN>123456789</StudentSSN><Address>123 Test Street</Address><ParentSSN>123456780</ParentSSN><APPLICATIONID>2</APPLICATIONID></APPLICATION>
2015-05-06 04:07:39.386 [INFO] Process:103 - Application completed Successfully ==== 1
2015-05-06 04:07:37.386 [INFO]Process:104 - Application submitted Successfully ==== 1
<APPLICATION><FirstName>Test2</FirstName><StudentSSN>323456789</StudentSSN><Address>234 Test Street</Address><ParentSSN>123456780</ParentSSN><APPLICATIONID>2</APPLICATIONID></APPLICATION>
2015-05-06 04:07:39.386 [INFO] Process:105 - Application completed Successfully ==== 1
which I am successfully parsing as per a solution provided to me in Parsing and manipulating log file with embedded xml . As per the post there, I use a .sed file with commands as follows:
s|<FirstName>[^<]*</FirstName>|<FirstName>***</FirstName>|
s|<StudentSSN>[^<]*</StudentSSN>|<StudentSSN>***</StudentSSN>|
s|<Address>[^<]*</Address>|<Address>***</Address>|
s|<ParentSSN>[^<]*</ParentSSN>|<ParentSSN>***</ParentSSN>|
My question is, is there a way to do a wild card match in the foo.sed file you have up above? So for example, if I wanted to match all *SSN tags and replace those with a **, rather than have one line for StudentSSN and another for ParentSSN and still yield the output as below:
2015-05-06 04:07:37.386 [INFO]Process:102 - Application submitted Successfully ==== 1
<APPLICATION><FirstName>***</FirstName><StudentSSN>***</StudentSSN><Address>*******</Address><ParentSSN>*********</ParentSSN> <APPLICATIONID>2</APPLICATIONID></APPLICATION>
2015-05-06 04:07:39.386 [INFO] Process:103 - Application completed Successfully ==== 1
2015-05-06 04:07:37.386 [INFO]Process:104 - Application submitted Successfully ==== 1
<APPLICATION><FirstName>***</FirstName><StudentSSN>*********</StudentSSN><Address>*****</Address><ParentSSN>*********</ParentSSN> <APPLICATIONID>2</APPLICATIONID></APPLICATION>
2015-05-06 04:07:39.386 [INFO] Process:105 - Application completed Successfully ==== 1
Thank you in advance
Upvotes: 0
Views: 1196
Reputation: 439247
choroba's helpful answer works well with GNU sed
, because using \|
for alternation in a basic regular expression (implied by the absence of the -r
option) is only supported there.
Also, the OP has since expressed a desire to use patterns to match similar element names.
Here's a solution that makes uses of extended regular expressions, which should work on both Linux (GNU Sed) and BSD/OSX platforms (BSD Sed):
sed -E 's%<([^>]*Name|[^>]*SSN|Address[^>]*)>[^<]*%<\1>***%g' file
Note:
[^>]*
rather than .*
so as to ensure that the matches remain confined to the opening tag.The above command is the equivalent of the following GNU Sed command using a basic regular expression - note the need to escape (
, )
, and |
:
sed 's%<\([^>]*Name\|[^>]*SSN\|Address[^>]*\)>[^<]*%<\1>***%g' file
Note, that it is the use of alternation (\|
) that makes this command not portable, because POSIX basic regular expressions do not support it.
Upvotes: 1
Reputation: 241988
You can use an alternative with \|
. I changed the delimiter to %
because of that:
sed -e 's%<\(FirstName\|StudentSSN\|Address\|ParentSSN\)>[^<]*</\1>%<\1>***</\1>%g'
Upvotes: 1