Replace text between pattern range on same line

Question

This may be a better task for awk than sed, but the goal is to parse a single, long string (it happens to be an XML doc) and replace text within a pattern range with another character.

I want to preserve the number of characters being replaced and simply mask them as asterisks. I've put something together in a python script to parse the XML tree but have a feeling a native program is going to be much faster.

Assuming the string: "123123"

...I'd like the output: "***123"

My first attempt with sed without using ranges got me this:

$ echo "123123" | sed "s/[0-9]/*/g"
******

I learned that sed can operate within ranges, but my understanding is that the behavior can only be toggled from line-to-line, not over the course of processing a single line.

Experimenting with pattern ranges got me the following (consistent with my understanding) and thus didn't work either:

$ echo "123123" | sed "//,// s/[0-9]/*/g" 
******

EDIT: In fact, even if there were line breaks in the input, I must not be understanding the pattern range behavior correctly (or my example is poorly constructed)

$ echo "123
123" | sed "//,// s/[0-9]/*/g" 
***
***

Any tips would be greatly appreciated.

Ed Morton · Accepted Answer

Never use range expressions as they make simple tasks very slightly briefer but then need a complete rewrite or duplicate conditions when your requirements become marginally more interesting, always use a flag variable instead if a range is necessary. What that means, of course, is that you can't use sed for problems like this since it doesn't support variables.

Anyway, here's a trivial GNU awk (for multi-char RS and RT) solution that doesn't directly use ranges at all:

$ cat file
Assuming the string: "123123" ...I'd like the

$ awk -v RS='' -v ORS= '{print gensub(/(.*).*/,"\1***",1) RT}' file
Assuming the string: "***123" ...I'd like the

or if you need the number of *s to match the number of characters they're replacing:

$ cat file
Assuming  first string: "123123" ...I'd like the
Assuming second string: "1234567123" ...I'd like the

$ awk -v RS='' 'match($0,/(.*)(.*)/,a){ $0=a[1] gensub(/./,"*","g",a[2]) } {ORS=RT} 1' file
Assuming  first string: "***123" ...I'd like the
Assuming second string: "*******123" ...I'd like the

Replace text between pattern range on same line

Answers (2)

Related Questions