bmikolaj
bmikolaj

Reputation: 486

Identify and remove specific hidden characters from text file

I have a text file that contains several hidden characters. Using cat -v I am able to see that they include the following;

^M

^[[A

There are also \n characters at the end of the line. I would like to be able to display these as well somehow.

Then I would like to be able to selectively cut and sed these hidden characters. How would I go able accomplishing this?

I've tried dos2unix but that didn't help remove any of the ^M characters. I've also tried sed s/^M//g wherein I pressed ctrl+v m.


Raw data

Output from cat -v on the raw data, also available at: http://pastebin.com/Vk2i81JC

^MCopying non-tried blocks... Pass 1 (forwards)^M^[[A^[[A^[[Arescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
^MFinished

Output wanted

Also available at: http://pastebin.com/wfDnrELm

rescued:         0 B,  errsize:       0 B,  current rate:        0 B/s
   ipos:         0 B,   errors:       0,    average rate:        0 B/s
   opos:         0 B, run time:       1 s,  successful read:       1 s ago
Finished

Upvotes: 5

Views: 17527

Answers (1)

Ram
Ram

Reputation: 1115

Try the below tr command which is used to translate or delete characters. The below command removes all the characters other than the one specified in octal within the quotes

octal \12 - new line(\n), octal \11 - TAB(^I), octal \40-\176 - are good characters.

For a complete reference of octal values refer to this page: https://courses.engr.illinois.edu/ece390/books/labmanual/ascii-code-table.html

tr -cd '\11\12\40-\176' < org.txt > new.txt

The file new.txt will contain the characters removed.

To remove the characters between ^M and remove the unnecessary control characters use the below command

sed "s/\r.*\r//g" org.txt | tr -cd '\11\12\40-\176' > new.txt

Upvotes: 8

Related Questions