Using wildcards with sed

Question

I have a log file that has embedded xml amongst normal STDOUT in it as follows:

2015-05-06 04:07:37.386 [INFO]Process:102 - Application submitted Successfully ==== 1
Test123456789123 Test Street
1234567802
2015-05-06 04:07:39.386 [INFO] Process:103 - Application completed Successfully ==== 1
2015-05-06 04:07:37.386 [INFO]Process:104 - Application submitted Successfully ==== 1
Test2323456789234 Test Street
1234567802
2015-05-06 04:07:39.386 [INFO] Process:105 - Application completed Successfully ==== 1

which I am successfully parsing as per a solution provided to me in Parsing and manipulating log file with embedded xml . As per the post there, I use a .sed file with commands as follows:

s|[^<]*|***|
s|[^<]*|***|
s|[^<]*
|***|
s|[^<]*|***|

My question is, is there a way to do a wild card match in the foo.sed file you have up above? So for example, if I wanted to match all *SSN tags and replace those with a **, rather than have one line for StudentSSN and another for ParentSSN and still yield the output as below:

2015-05-06 04:07:37.386 [INFO]Process:102 - Application submitted Successfully ==== 1
*************
*********   2
2015-05-06 04:07:39.386 [INFO] Process:103 - Application completed Successfully ==== 1
2015-05-06 04:07:37.386 [INFO]Process:104 - Application submitted Successfully ==== 1
*****************
*********   2
2015-05-06 04:07:39.386 [INFO] Process:105 - Application completed Successfully ==== 1

Thank you in advance

mklement0 · Accepted Answer

choroba's helpful answer works well with GNU sed, because using \| for alternation in a basic regular expression (implied by the absence of the -r option) is only supported there.

Also, the OP has since expressed a desire to use patterns to match similar element names.

Here's a solution that makes uses of extended regular expressions, which should work on both Linux (GNU Sed) and BSD/OSX platforms (BSD Sed):

sed -E 's%<([^>]*Name|[^>]*SSN|Address[^>]*)>[^<]*%<\1>***%g' file

Note:

It is import to match the variable parts of the element names with [^>]* rather than .* so as to ensure that the matches remain confined to the opening tag.
BSD/OSX extended regular expressions (in accordance with POSIX extended regular expressions) do not support backreferences inside the regular expression itself (as opposed to the "backreferences" that refer to capture-group matches in the replacement string), so no attempt is made to match the closing tag with one.
While this command works on the stated platforms, it is not POSIX-compliant, because POSIX only mandates support for basic regular expressions in Sed.

The above command is the equivalent of the following GNU Sed command using a basic regular expression - note the need to escape (, ), and |:

sed  's%<$[^>]*Name\|[^>]*SSN\|Address[^>]*$>[^<]*%<\1>***%g' file

Note, that it is the use of alternation (\|) that makes this command not portable, because POSIX basic regular expressions do not support it.

Using wildcards with sed

Answers (2)

Related Questions