how to replace repetitive string of variable length with another string in bash?

Question

I have files where missing data is inserted as '+'. So lines look like this:

substring1+++++substring2++++++++++++++substring3+substring4

I wanna replace all repetitions of '+' >5 with 'MISSING'. This makes it more readable for my team and makes it easier to see the difference between missing data and data entered as '+' (up to 5 is allowed). So far I have:

while read l; do
  echo "${l//['([+])\1{5}']/'MISSING'}"
done < /path/file.txt

but this replaces every '+' with 'MISSING'. I need it to say 'MISSING' just once.

Thanks in advance.

Wiktor Stribiżew · Accepted Answer

You can't use regex in Bash variable expansion.

In your loop, you may use

sed 's/+\{1,\}/MISSING/g' <<< "$l"

Or, you may use sed directly on the file

sed 's/+\{1,\}/MISSING/g' /path/file.txt

The +\{1,\} POSIX BRE pattern matches a literal + (+) 1 or more times (\{1,\}).

See the sed demo online

sed 's/+\{1,\}/MISSING/g' <<< "substring1+++++substring2++++++++++++++substring3+substring4"
# => substring1MISSINGsubstring2MISSINGsubstring3MISSINGsubstring4

If you need to make changes to the same file use any technique described at sed edit file in place.

how to replace repetitive string of variable length with another string in bash?

Answers (1)

Related Questions