hasan
hasan

Reputation: 951

How to remove ^[, and all of the ANSI escape sequences in a file using linux shell scripting

We want to remove ^[, and all of the escape sequences.

sed is not working and is giving us this error:

$ sed 's/^[//g' oldfile > newfile; mv newfile oldfile;
sed: -e expression #1, char 7: unterminated `s' command

$ sed -i '' -e 's/^[//g' somefile
sed: -e expression #1, char 7: unterminated `s' command

Upvotes: 95

Views: 87899

Answers (16)

Owl
Owl

Reputation: 1562

These answers did not work for me.

I had a 40MB text file with ^B and ^C captured from an rs232 device. All my efforts to remove the ^B and ^C's failed.

To remove all special characters including newlines \n and carriage returns, \r was:

cat InputFile.txt | tr -d "[:cntrl:]" > OutputFile.txt

The tr -d "[:cntrl:]" deletes all the control characters from the output.

If you want to keep the newlines \n and carriage returns \n, then one way might be to remap \r and \n to \275 and \276 respectively, delete the control characters and then remap the charactors back to \r and \n as follows:

cat InputFile.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n" > OutputFile.txt

Note: if your file already contains \275 and \276 characters then look for different characters that aren't in the file.

Upvotes: 1

Tim Nieradzik
Tim Nieradzik

Reputation: 309

You can use the astrp CLI tool for stripping ANSI escape codes. astrp is built on top of Alacritty's VTE parser which passes the input through a state machine. This approach should be more robust than replacing escape codes with regular expressions.

Upvotes: 0

Luke H
Luke H

Reputation: 3163

I managed with the following for my purposes, but this doesn't include all possible ANSI escapes:

sed -r 's/\x1b\[[0-9;]*m?//g'

This removes m commands, but for all escapes (as commented by @lethalman) use:

sed -r 's/\x1b\[[^@-~]*[@-~]//g'

Also see "https://stackoverflow.com/questions/7857352/python-regex-to-match-vt100-escape-sequences".

There is also a table of common escape sequences.

Upvotes: 23

kbulgrien
kbulgrien

Reputation: 4518

Tom Hale's answer left unwanted codes, but was a good base to work from. Adding additional filtering cleared out leftover, unwanted codes:

sed -e "s,^[[[(][0-9;?]*[a-zA-Z],,g" \
    -e "s/^[[[][0-9][0-9]*[@]//" \
    -e "s/^[[=0-9]<[^>]*>//" \
    -e "s/^[[)][0-9]//" \
    -e "s/.^H//g" \
    -e "s/^M//g" \
    -e "s/^^H//" \
        file.dirty > file.clean

As this was done on a non-GNU version of sed, where you see ^[, ^H, and ^M, I used Ctrl-V <Esc>, Ctrl-V Ctrl-H, and Ctrl-V Ctrl-M respectively. The ^> is literally a carat (^) and greater-than character, not Ctrl-<.

TERM=xterm was in use at the time.

To remove PCL codes, add patterns like this:

sed -e "s/^[[&()*][a-z]*[-+]*[0-9][0-9]*[A-Z]//" \
    -e "s/^[[=9EZYz]//" \
        file.dirty > file.clean

Ideally, if the regular expressions are used with an interpreter that understands the ? meta-character, the first pattern is better expressed as:

      "s/^[[&()*][a-z]?[-+]?[0-9][0-9]*[A-Z]//" \

Upvotes: 2

Isuru Sampath
Isuru Sampath

Reputation: 11

This simple awk solution worked for me, try this:

str="happy $(tput setaf 1)new$(tput sgr0) year!" #colored text
echo $str | awk '{gsub("(.\\[[0-9]+m|.\\(..\\[m)","",$0)}1' #remove ansi colors

Upvotes: 1

gronostaj
gronostaj

Reputation: 2282

I've stumbled upon this post when looking for a way to strip extra formatting from man pages. ansifilter did it, but it was far from desired result (for example all previously-bold characters were duplicated, like SSYYNNOOPPSSIISS).

For that task the correct command would be col -bx, for example:

groff -man -Tascii fopen.3 | col -bx > fopen.3.txt

(source)

Why this works: (in response to a comment by @AttRigh)

groff produces bold characters like you would on a typewriter: print a letter, move one character back with backspace (you can't erase text on a typewriter), print the same letter again to make the character more pronounced. So simply omitting backspaces produces "SSYYNNOOPPSSIISS". col -b fixes this by interpreting backspaces correctly, quote from the manual:

-b Do not output any backspaces, printing only the last character written to each column position.

Upvotes: 12

Frank Hoeflich
Frank Hoeflich

Reputation: 650

My answer to

What are these weird ha:// URLs jenkins fills our logs with?

removes all ANSI escape sequences from Jenkins console log files effectively (it also deals with Jenkins-specific URLs which wouldn't be relevant here).

I acknowledge and appreciate the contributions of Marius Gedminas and pyjama from this thread in formulating the ultimate solution.

Upvotes: 1

palik
palik

Reputation: 2863

sed based approach without extended regular expressions enabled by -r

sed 's/\x1B\[[0-9;]*[JKmsu]//g'

Upvotes: 6

rdesgroppes
rdesgroppes

Reputation: 1117

A bash snippet I've been using for stripping out (at least some) ANSI colors:

shopt -s extglob
while IFS='' read -r line; do
  echo "${line//$'\x1b'\[*([0-9;])[Km]/}"
done

Upvotes: 1

sehe
sehe

Reputation: 393557

Are you looking for ansifilter?


Two things you can do: enter the literal escape (in bash:)

Using keyboard entry:

sed 's/Ctrl-vEsc//g'

alternatively

sed 's/Ctrl-vCtrl-[//g'

Or you can use character escapes:

sed 's/\x1b//g'

or for all control characters:

sed 's/[\x01-\x1F\x7F]//g' # NOTE: zaps TAB character too!

Upvotes: 74

pyjama
pyjama

Reputation: 119

You can remove all non printable characters with this:

sed 's/[^[:print:]]//g'

Upvotes: 11

AGipson
AGipson

Reputation: 181

I don't have enough reputation to add a comment to the answer given by Luke H, but I did want to share the regular expression that I've been using to eliminate all of the ASCII Escape Sequences.

sed -r 's~\x01?(\x1B\(B)?\x1B\[([0-9;]*)?[JKmsu]\x02?~~g'

Upvotes: 12

lunixbochs
lunixbochs

Reputation: 22415

I built vtclean for this. It strips escape sequences using these regular expressions in order (explained in regex.txt):

// handles long-form RGB codes
^\033](\d+);([^\033]+)\033\\

// excludes non-movement/color codes
^\033(\[[^a-zA-Z0-9@\?]+|[\(\)]).

// parses movement and color codes
^\033([\[\]]([\d\?]+)?(;[\d\?]+)*)?(.)`)

It additionally does basic line-edit emulation, so backspace and other movement characters (like left arrow key) are parsed.

Upvotes: 6

Tom Hale
Tom Hale

Reputation: 46963

commandlinefu gives the correct answer which strips ANSI colours as well as movement commands:

sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"

Upvotes: 63

soorajmr
soorajmr

Reputation: 530

ansi2txt command (part of kbtin package) seems to be doing the job perfectly on Ubuntu.

Upvotes: 19

sdaau
sdaau

Reputation: 38641

Just a note; let's say you have a file like this (such line endings are generated by git remote reports):

echo -e "remote: * 27625a8 (HEAD, master) 1st git commit\x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: Current branch master is up to date.\x1b[K" > chartest.txt

In binary, this looks like this:

$ cat chartest.txt | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
00000050  65 3a 20 1b 5b 4b 0a 72  65 6d 6f 74 65 3a 20 1b  |e: .[K.remote: .|
00000060  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000070  65 6d 6f 74 65 3a 20 43  75 72 72 65 6e 74 20 62  |emote: Current b|
00000080  72 61 6e 63 68 20 6d 61  73 74 65 72 20 69 73 20  |ranch master is |
00000090  75 70 20 74 6f 20 64 61  74 65 2e 1b 5b 4b 0a     |up to date..[K.|
0000009f

It is visible that git here adds the sequence 0x1b 0x5b 0x4b before the line ending (0x0a).

Note that - while you can match the 0x1b with a literal format \x1b in sed, you CANNOT do the same for 0x5b, which represents the left square bracket [:

$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
sed: -e expression #1, char 13: Invalid regular expression

You might think you can escape the representation with an extra backslash \ - which ends up as \\x5b; but while that "passes" - it doesn't match anything as intended:

$ cat chartest.txt | sed 's/\x1b\\x5b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 1b  | 1st git commit.|
00000030  5b 4b 0a 72 65 6d 6f 74  65 3a 20 1b 5b 4b 0a 72  |[K.remote: .[K.r|
00000040  65 6d 6f 74 65 3a 20 1b  5b 4b 0a 72 65 6d 6f 74  |emote: .[K.remot|
...

So if you want to match this character, apparently you must write it as escaped left square bracket, that is \[ - the rest of the values can than be entered with escaped \x notation:

$ cat chartest.txt | sed 's/\x1b\[\x4b//g' | hexdump -C
00000000  72 65 6d 6f 74 65 3a 20  2a 20 32 37 36 32 35 61  |remote: * 27625a|
00000010  38 20 28 48 45 41 44 2c  20 6d 61 73 74 65 72 29  |8 (HEAD, master)|
00000020  20 31 73 74 20 67 69 74  20 63 6f 6d 6d 69 74 0a  | 1st git commit.|
00000030  72 65 6d 6f 74 65 3a 20  0a 72 65 6d 6f 74 65 3a  |remote: .remote:|
00000040  20 0a 72 65 6d 6f 74 65  3a 20 0a 72 65 6d 6f 74  | .remote: .remot|
00000050  65 3a 20 0a 72 65 6d 6f  74 65 3a 20 0a 72 65 6d  |e: .remote: .rem|
00000060  6f 74 65 3a 20 43 75 72  72 65 6e 74 20 62 72 61  |ote: Current bra|
00000070  6e 63 68 20 6d 61 73 74  65 72 20 69 73 20 75 70  |nch master is up|
00000080  20 74 6f 20 64 61 74 65  2e 0a                    | to date..|
0000008a

Upvotes: 3

Related Questions