jerik
jerik

Reputation: 5767

remove whitespace from piped output

In a textfile i have some tags with the notation :foo. To get an overview of my tags in the file, I want to get a listing of all this tags.

This is done via

grep -o -e ":[a-z]*\( \|$\)" file.txt | sort |  uniq

Now I get duplicates because of the whitespace or newline character at the end.

:movie  <-- only newline
:movie  <-- whitespace and newline
:read
:read 

I want to avoid the duplicates. But I could not figure out how. I tried with | tr -d '[:space:]', but this leads only to a concatenation of all pipe output...

Example of the file.txt

Avengers: Infinity War :movie
Yojimbo 1961 :movie nippon

Upvotes: 0

Views: 818

Answers (4)

perreal
perreal

Reputation: 97948

You can use Perl regexp and word matching:

grep -oP ':\w+' file.txt | sort |  uniq

or, just match non-space characters:

grep -o ':[^ ]*' file.txt | sort |  uniq

Upvotes: 1

James Brown
James Brown

Reputation: 37404

Some test lines (there is a space after the first :space, you can see it if you highlight the data with your mouse):

$ cat file
with :space 
with :space too
without :space
test: this

With grep, sort and uniq:

$ grep -o ":[a-z]\+" file | sort | uniq 
:space

With awk (well, gawk and mawk at least):

$ awk 'BEGIN{RS="[" FS "|" RS "]+"}/:[a-z]/&&!a[$0]++' file
:space

Each word is its own record and we pick the first instance of every colon-starting word. RS="[" FS "|" RS "]+" could be written otherwise but it is in this form to emphasize any combination of FS and RS.

Upvotes: 2

ctac_
ctac_

Reputation: 2471

You can try with sed

sed 's/.*\(:[a-z]*\).*/\1/' file.txt | sort | uniq

Upvotes: 0

RavinderSingh13
RavinderSingh13

Reputation: 133458

Since you haven't provided the sample Input_file so couldn't test it as well as I don't have zsh with me. Try following and let me know if this helps you.

awk '/:[a-z]*/{sub(/ +$/,"");} !a[$0]++' Input_file | sort

Upvotes: 0

Related Questions