Reputation: 5767
In a textfile i have some tags with the notation :foo
. To get an overview of my tags in the file, I want to get a listing of all this tags.
This is done via
grep -o -e ":[a-z]*\( \|$\)" file.txt | sort | uniq
Now I get duplicates because of the whitespace or newline character at the end.
:movie <-- only newline
:movie <-- whitespace and newline
:read
:read
I want to avoid the duplicates. But I could not figure out how. I tried with | tr -d '[:space:]'
, but this leads only to a concatenation of all pipe output...
Example of the file.txt
Avengers: Infinity War :movie
Yojimbo 1961 :movie nippon
Upvotes: 0
Views: 818
Reputation: 97948
You can use Perl regexp and word matching:
grep -oP ':\w+' file.txt | sort | uniq
or, just match non-space characters:
grep -o ':[^ ]*' file.txt | sort | uniq
Upvotes: 1
Reputation: 37404
Some test lines (there is a space after the first :space
, you can see it if you highlight the data with your mouse):
$ cat file
with :space
with :space too
without :space
test: this
With grep
, sort
and uniq
:
$ grep -o ":[a-z]\+" file | sort | uniq
:space
With awk (well, gawk and mawk at least):
$ awk 'BEGIN{RS="[" FS "|" RS "]+"}/:[a-z]/&&!a[$0]++' file
:space
Each word is its own record and we pick the first instance of every colon-starting word. RS="[" FS "|" RS "]+"
could be written otherwise but it is in this form to emphasize any combination of FS
and RS
.
Upvotes: 2
Reputation: 2471
You can try with sed
sed 's/.*\(:[a-z]*\).*/\1/' file.txt | sort | uniq
Upvotes: 0
Reputation: 133458
Since you haven't provided the sample Input_file so couldn't test it as well as I don't have zsh with me. Try following and let me know if this helps you.
awk '/:[a-z]*/{sub(/ +$/,"");} !a[$0]++' Input_file | sort
Upvotes: 0