Reputation: 951
We want to remove ^[
, and all of the escape sequences.
sed is not working and is giving us this error:
$ sed 's/^[//g' oldfile > newfile; mv newfile oldfile;
sed: -e expression #1, char 7: unterminated `s' command
$ sed -i '' -e 's/^[//g' somefile
sed: -e expression #1, char 7: unterminated `s' command
Upvotes: 95
Views: 87899
Reputation: 1562
These answers did not work for me.
I had a 40MB text file with ^B and ^C captured from an rs232 device. All my efforts to remove the ^B and ^C's failed.
To remove all special characters including newlines \n and carriage returns, \r was:
cat InputFile.txt | tr -d "[:cntrl:]" > OutputFile.txt
The tr -d "[:cntrl:]"
deletes all the control characters from the output.
If you want to keep the newlines \n and carriage returns \n, then one way might be to remap \r and \n to \275 and \276 respectively, delete the control characters and then remap the charactors back to \r and \n as follows:
cat InputFile.txt | tr '\r\n' '\275\276' | tr -d "[:cntrl:]" | tr "\275\276" "\r\n" > OutputFile.txt
Note: if your file already contains \275 and \276 characters then look for different characters that aren't in the file.
Upvotes: 1
Reputation: 309
You can use the astrp CLI tool for stripping ANSI escape codes. astrp is built on top of Alacritty's VTE parser which passes the input through a state machine. This approach should be more robust than replacing escape codes with regular expressions.
Upvotes: 0
Reputation: 3163
I managed with the following for my purposes, but this doesn't include all possible ANSI escapes:
sed -r 's/\x1b\[[0-9;]*m?//g'
This removes m
commands, but for all escapes (as commented by @lethalman) use:
sed -r 's/\x1b\[[^@-~]*[@-~]//g'
Also see "https://stackoverflow.com/questions/7857352/python-regex-to-match-vt100-escape-sequences".
There is also a table of common escape sequences.
Upvotes: 23
Reputation: 4518
Tom Hale's answer left unwanted codes, but was a good base to work from. Adding additional filtering cleared out leftover, unwanted codes:
sed -e "s,^[[[(][0-9;?]*[a-zA-Z],,g" \
-e "s/^[[[][0-9][0-9]*[@]//" \
-e "s/^[[=0-9]<[^>]*>//" \
-e "s/^[[)][0-9]//" \
-e "s/.^H//g" \
-e "s/^M//g" \
-e "s/^^H//" \
file.dirty > file.clean
As this was done on a non-GNU version of sed, where you see ^[
, ^H
, and ^M
, I used Ctrl-V <Esc>, Ctrl-V Ctrl-H, and Ctrl-V Ctrl-M respectively. The ^>
is literally a carat (^) and greater-than character, not Ctrl-<.
TERM=xterm was in use at the time.
To remove PCL codes, add patterns like this:
sed -e "s/^[[&()*][a-z]*[-+]*[0-9][0-9]*[A-Z]//" \
-e "s/^[[=9EZYz]//" \
file.dirty > file.clean
Ideally, if the regular expressions are used with an interpreter that understands the ? meta-character, the first pattern is better expressed as:
"s/^[[&()*][a-z]?[-+]?[0-9][0-9]*[A-Z]//" \
Upvotes: 2
Reputation: 11
This simple awk solution worked for me, try this:
str="happy $(tput setaf 1)new$(tput sgr0) year!" #colored text
echo $str | awk '{gsub("(.\\[[0-9]+m|.\\(..\\[m)","",$0)}1' #remove ansi colors
Upvotes: 1
Reputation: 2282
I've stumbled upon this post when looking for a way to strip extra formatting from man pages. ansifilter did it, but it was far from desired result (for example all previously-bold characters were duplicated, like SSYYNNOOPPSSIISS
).
For that task the correct command would be col -bx
, for example:
groff -man -Tascii fopen.3 | col -bx > fopen.3.txt
Why this works: (in response to a comment by @AttRigh)
groff
produces bold characters like you would on a typewriter: print a letter, move one character back with backspace (you can't erase text on a typewriter), print the same letter again to make the character more pronounced. So simply omitting backspaces produces "SSYYNNOOPPSSIISS". col -b
fixes this by interpreting backspaces correctly, quote from the manual:
-b Do not output any backspaces, printing only the last character written to each column position.
Upvotes: 12
Reputation: 650
My answer to
What are these weird ha:// URLs jenkins fills our logs with?
removes all ANSI escape sequences from Jenkins console log files effectively (it also deals with Jenkins-specific URLs which wouldn't be relevant here).
I acknowledge and appreciate the contributions of Marius Gedminas and pyjama from this thread in formulating the ultimate solution.
Upvotes: 1
Reputation: 2863
sed
based approach without extended regular expressions enabled by -r
sed 's/\x1B\[[0-9;]*[JKmsu]//g'
Upvotes: 6
Reputation: 1117
A bash snippet I've been using for stripping out (at least some) ANSI colors:
shopt -s extglob
while IFS='' read -r line; do
echo "${line//$'\x1b'\[*([0-9;])[Km]/}"
done
Upvotes: 1
Reputation: 393557
Are you looking for ansifilter?
Two things you can do: enter the literal escape (in bash:)
Using keyboard entry:
sed 's/Ctrl-vEsc//g'
alternatively
sed 's/Ctrl-vCtrl-[//g'
Or you can use character escapes:
sed 's/\x1b//g'
or for all control characters:
sed 's/[\x01-\x1F\x7F]//g' # NOTE: zaps TAB character too!
Upvotes: 74
Reputation: 119
You can remove all non printable characters with this:
sed 's/[^[:print:]]//g'
Upvotes: 11
Reputation: 181
I don't have enough reputation to add a comment to the answer given by Luke H, but I did want to share the regular expression that I've been using to eliminate all of the ASCII Escape Sequences.
sed -r 's~\x01?(\x1B\(B)?\x1B\[([0-9;]*)?[JKmsu]\x02?~~g'
Upvotes: 12
Reputation: 22415
I built vtclean for this. It strips escape sequences using these regular expressions in order (explained in regex.txt):
// handles long-form RGB codes
^\033](\d+);([^\033]+)\033\\
// excludes non-movement/color codes
^\033(\[[^a-zA-Z0-9@\?]+|[\(\)]).
// parses movement and color codes
^\033([\[\]]([\d\?]+)?(;[\d\?]+)*)?(.)`)
It additionally does basic line-edit emulation, so backspace and other movement characters (like left arrow key) are parsed.
Upvotes: 6
Reputation: 46963
commandlinefu gives the correct answer which strips ANSI colours as well as movement commands:
sed "s,\x1B\[[0-9;]*[a-zA-Z],,g"
Upvotes: 63
Reputation: 530
ansi2txt command (part of kbtin package) seems to be doing the job perfectly on Ubuntu.
Upvotes: 19
Reputation: 38641
Just a note; let's say you have a file like this (such line endings are generated by git
remote reports):
echo -e "remote: * 27625a8 (HEAD, master) 1st git commit\x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: \x1b[K
remote: Current branch master is up to date.\x1b[K" > chartest.txt
In binary, this looks like this:
$ cat chartest.txt | hexdump -C
00000000 72 65 6d 6f 74 65 3a 20 2a 20 32 37 36 32 35 61 |remote: * 27625a|
00000010 38 20 28 48 45 41 44 2c 20 6d 61 73 74 65 72 29 |8 (HEAD, master)|
00000020 20 31 73 74 20 67 69 74 20 63 6f 6d 6d 69 74 1b | 1st git commit.|
00000030 5b 4b 0a 72 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 |[K.remote: .[K.r|
00000040 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 65 6d 6f 74 |emote: .[K.remot|
00000050 65 3a 20 1b 5b 4b 0a 72 65 6d 6f 74 65 3a 20 1b |e: .[K.remote: .|
00000060 5b 4b 0a 72 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 |[K.remote: .[K.r|
00000070 65 6d 6f 74 65 3a 20 43 75 72 72 65 6e 74 20 62 |emote: Current b|
00000080 72 61 6e 63 68 20 6d 61 73 74 65 72 20 69 73 20 |ranch master is |
00000090 75 70 20 74 6f 20 64 61 74 65 2e 1b 5b 4b 0a |up to date..[K.|
0000009f
It is visible that git
here adds the sequence 0x1b
0x5b
0x4b
before the line ending (0x0a
).
Note that - while you can match the 0x1b
with a literal format \x1b
in sed, you CANNOT do the same for 0x5b
, which represents the left square bracket [
:
$ cat chartest.txt | sed 's/\x1b\x5b//g' | hexdump -C
sed: -e expression #1, char 13: Invalid regular expression
You might think you can escape the representation with an extra backslash \
- which ends up as \\x5b
; but while that "passes" - it doesn't match anything as intended:
$ cat chartest.txt | sed 's/\x1b\\x5b//g' | hexdump -C
00000000 72 65 6d 6f 74 65 3a 20 2a 20 32 37 36 32 35 61 |remote: * 27625a|
00000010 38 20 28 48 45 41 44 2c 20 6d 61 73 74 65 72 29 |8 (HEAD, master)|
00000020 20 31 73 74 20 67 69 74 20 63 6f 6d 6d 69 74 1b | 1st git commit.|
00000030 5b 4b 0a 72 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 |[K.remote: .[K.r|
00000040 65 6d 6f 74 65 3a 20 1b 5b 4b 0a 72 65 6d 6f 74 |emote: .[K.remot|
...
So if you want to match this character, apparently you must write it as escaped left square bracket, that is \[
- the rest of the values can than be entered with escaped \x
notation:
$ cat chartest.txt | sed 's/\x1b\[\x4b//g' | hexdump -C
00000000 72 65 6d 6f 74 65 3a 20 2a 20 32 37 36 32 35 61 |remote: * 27625a|
00000010 38 20 28 48 45 41 44 2c 20 6d 61 73 74 65 72 29 |8 (HEAD, master)|
00000020 20 31 73 74 20 67 69 74 20 63 6f 6d 6d 69 74 0a | 1st git commit.|
00000030 72 65 6d 6f 74 65 3a 20 0a 72 65 6d 6f 74 65 3a |remote: .remote:|
00000040 20 0a 72 65 6d 6f 74 65 3a 20 0a 72 65 6d 6f 74 | .remote: .remot|
00000050 65 3a 20 0a 72 65 6d 6f 74 65 3a 20 0a 72 65 6d |e: .remote: .rem|
00000060 6f 74 65 3a 20 43 75 72 72 65 6e 74 20 62 72 61 |ote: Current bra|
00000070 6e 63 68 20 6d 61 73 74 65 72 20 69 73 20 75 70 |nch master is up|
00000080 20 74 6f 20 64 61 74 65 2e 0a | to date..|
0000008a
Upvotes: 3