I Z
I Z

Reputation: 5927

bash sort / uniq -c: how to use tab instead of space as delimiter in output?

I have a file strings.txt listing strings, which I am processing like this:

sort strings.txt | uniq -c | sort -n > uniq.counts

So the resulting file uniq.counts will list uniq strings sorted in the ascending order by their counts, so something like this:

 1 some string with    spaces
 5 some-other,string
25 most;frequent:string

Note that strings in strings.txt may contain spaces, commas, semicolons and other separators, except for the tab. How can I get uniq.counts to be in this format:

 1<tab>some string with    spaces
 5<tab>some-other,string
25<tab>most;frequent:string

Upvotes: 4

Views: 3305

Answers (3)

anubhava
anubhava

Reputation: 785641

You can do:

sort strings.txt | uniq -c | sort -n | sed -E 's/^ *//; s/ /\t/' > uniq.counts

sed will first remove all leading spaces at the beginning of the line (before counts) and then it will replace space after count to tab character.

Upvotes: 5

David C. Rankin
David C. Rankin

Reputation: 84579

You can simply pipe the output of the sort, etc to sed before writing to uniq.counts, e.g. add:

| sed -e 's/^\([0-9][0-9]*\)\(.*$\)/\1\t\2/' > uniq.counts

The full expression would be:

$ sort strings.txt | uniq -c | sort -n | \
sed -e 's/^\([0-9][0-9]*\)\(.*$\)/\1\t\2/' > uniq.counts

(line continuation included for clarity)

Upvotes: 3

Cyrus
Cyrus

Reputation: 88776

With GNU sed:

sort strings.txt | uniq -c | sort -n | sed -r 's/([0-9]) /\1\t/' > uniq.counts

Output to uniq.counts:

 1      some string with    spaces
 5      some-other,string
25      most;frequent:string

If you want to edit your file "in place" use sed's option -i.

Upvotes: 2

Related Questions