ravi
ravi

Reputation: 207

Remove multiple Spaces - Unix Script

From my shell script i am trying to remove repeating white spaces but it seems

tr -s " " < input.txt > output.txt 

is not working. Is there any other way to remove multiple consecutive blank spaces to a single blank space from shell script.

I am trying to remove blank spaces from content

1     | First         | PO BOX 123        | DAYTON          | OH            | 3432-222
2     | Second        | PO BOX 2223       | CALIFORNIA      | CA            | 23423 
3     | THIRD         | PO BOX 21         | COLUMBUS        | OH            | 2223

into this

1|First|PO BOX 123|DAYTON|OH|3432-222
2|Second|PO BOX 2223|CALIFORNIA|CA|23423
3|THIRD|PO BOX 21|COLUMBUS|OH|2223

Upvotes: 1

Views: 2027

Answers (4)

Niels Van Steen
Niels Van Steen

Reputation: 308

I had to do something like this in the /etc/servicesfile.

None of the sed methods worked for me (on this question and many others).

tr -s " " also did nothing tr -s "\t" removed some of the white spaces (pipelining this to tr -s " " did nothing either)

A solution I found was using 'column -t'

 column -t /etc/services | tr -s " " 

As I understand it (may be wrong) the column command creates a table. the -t defines the amount of columns (which I think is not specified since it comes from the input file). then I can remove all the extra spaces with 'tr -s " "'.

Upvotes: 0

ghoti
ghoti

Reputation: 46846

I like using awk for things that have records. tr translates text, sed is a stream editor, but awk understands the concept of records, fields, field separators, etc.

So to complete your set of options, here's a solution in minimal awk:

$ awk -F ' *\\| *' '{$1=$1} 1' OFS='|' input.txt
1|First|PO BOX 123|DAYTON|OH|3432-222
2|Second|PO BOX 2223|CALIFORNIA|CA|23423
3|THIRD|PO BOX 21|COLUMBUS|OH|2223

This sets an input field separator with -F and an output field separator with OFS. The script consists of a statement which causes the record to be rewritten with OFS, and a statement (the 1 shortcut) to print the line.

Note the weird escaping of the vertical bar in the -F option. If you were to use this, to avoid confusion, you might want to awk -F ' *[|] *' ... instead.

To be even shorter at the expense of clarity, you might also use:

$ awk -F ' *[|] *' '$1=$1' OFS='|' input.txt

This turns the record rewrite statement into a condition which should always return true, thus eliminating the need for the 1 shortcut. While it shaves a few characters off the script, I include it only for fun; much better to use code that doesn't make you scratch your head when you re-read it in a year or two. ;)

Upvotes: 1

agc
agc

Reputation: 8406

Using minimal sed:

sed 's/ *| */|/g' input.txt 

Output:

1|First|PO BOX 123|DAYTON|OH|3432-222
2|Second|PO BOX 2223|CALIFORNIA|CA|23423 
3|THIRD|PO BOX 21|COLUMBUS|OH|2223

Note: This is functionally the same code as PaulProgrammer's answer, but simplified due to the fact that input.txt's whitespace is purely space "" chars, (no tabs, et al).

Upvotes: 3

PaulProgrammer
PaulProgrammer

Reputation: 17630

Try using sed instead of tr:

sed 's/[[:blank:]]\{1,\}|[[:blank:]]\{1,\}/|/g' < input > output

or, in perl instead:

perl -ne 's#\s+\|\s+#|#g; print;' input > output

Upvotes: 2

Related Questions