LCB
LCB

Reputation: 87

Strip leading AND trailing ansi/tput codes from string

The application here is "sanitizing" strings for inclusion in a log file. For the sake of argument, let's assume that 1) colorizing the string at runtime is proper; and 2) I need leading and trailing spaces on screen but excess whitespace removed from the log.

The specific application here is tee-ing into a log file. Not all lines would be colorized, and not all lines would have leading/trailing spaces.

Given this, I want to

  1. Remove all codes both setting the color and resetting. The reason for this will be apparent in a moment
  2. Remove leading and trailing whitespace

When you search (anywhere) for how to strip color codes in bash, you can find many different ways to accomplish it. What I have discovered so far however is nobody seems to address the trailing reset; the $(tput sgr0). In the examples I have seen this is inconsequential, however my additional requirement to strip leading/trailing spaces complicates it/makes it a requirement.

Here is my example script which demonstrates the issue:

#!/bin/bash

# Create a string with color, leading spaces, trailing spaces, and a reset
REPLY="$(tput setaf 2)       This is green        $(tput sgr0)"
echo "Colored output:  $REPLY"
# Remove initial color code
REPLY="$(echo "$REPLY" | sed 's,\x1B\[[0-9;]*[a-zA-Z],,g')"
echo "De-colorized output:  $REPLY"
# Remove leading and trailing spaces if present
REPLY="$(printf "%s" "${REPLY#"${REPLY%%[![:space:]]*}"}" | sed -n -e 'l')"
echo "Leading spaces removed:  $REPLY"
REPLY="$(printf "%s" "${REPLY%"${REPLY##*[![:space:]]}"}" | sed -n -e 'l')"
echo "Trailing spaces removed:  $REPLY"

The output is (can't figure out how to color text here, assume the first line is green, subsequent lines are not):

screen cap

I am willing to see the error of my ways, but after about three hours trying different things, I'm pretty sure my google-fu is failing me.

Thanks for any assistance.

Upvotes: 1

Views: 567

Answers (2)

sdht0
sdht0

Reputation: 560

This works for me:

$ REPLY="$(tput setaf 2)       This is green        $(tput sgr0)"
$ echo -n $REPLY | od -vAn -tcx1
 033   [   3   2   m                               T   h   i   s
  1b  5b  33  32  6d  20  20  20  20  20  20  20  54  68  69  73
       i   s       g   r   e   e   n                            
  20  69  73  20  67  72  65  65  6e  20  20  20  20  20  20  20
     033   [   m 017
  20  1b  5b  6d  0f
$ REPLY=$(echo $REPLY | sed -r 's,\x1B[\[\(][0-9;]*[a-zA-Z]\s*(.*)\x1B[\[\(].*,\1,g' | sed 's/\s*$//')
$ echo -n $REPLY | od -vAn -tcx1
   T   h   i   s       i   s       g   r   e   e   n
  54  68  69  73  20  69  73  20  67  72  65  65  6e

Apparently sed does not support non-greedy regex, which would have eliminated the second regex.

EDIT: This one should work for the input you have:

$ REPLY="$(tput setaf 2)       This is green        "$'\x1B'"(B$(tput sgr0)"
$ echo -n $REPLY | od -vAn -tcx1
 033   [   3   2   m                               T   h   i   s
  1b  5b  33  32  6d  20  20  20  20  20  20  20  54  68  69  73
       i   s       g   r   e   e   n                            
  20  69  73  20  67  72  65  65  6e  20  20  20  20  20  20  20
     033   (   B 033   [   m 017
  20  1b  28  42  1b  5b  6d  0f
$ REPLY=$(echo "$REPLY" | sed -r -e 's,\x1B[\[\(][0-9;]*[a-zA-Z]\s*([^\x1B]+)\s+\x1B.*,\1,g' -e 's,\s*$,,')
$ echo -n $REPLY | od -vAn -tcx1
   T   h   i   s       i   s       g   r   e   e   n
  54  68  69  73  20  69  73  20  67  72  65  65  6e

I find sed to be much less cryptic (or as less cryptic as regular expressions can be) as compared to bash substitutions. But that's just me :)

Upvotes: 0

Armali
Armali

Reputation: 19395

I am willing to see the error of my ways, …

The primary error is just that the sed command removes only the Esc[… control sequences, but not the Esc(B sequence which is also part of sgr0. It works if you change it to

… | sed 's,\x1B[[(][0-9;]*[a-zA-Z],,g'

The secondary error is that the sed -n -e 'l' command adds a literal $ sign at the end of the line, hence the former trailing spaces aren't trailing anymore and therefore not removed.

Upvotes: 1

Related Questions