Reputation: 349
I have a file with 8 columns using |
as a delimiter and I want to count the occurence frequency of the words in the 8th column. I tried awk like this
awk -F '{print $8}' | sort | uniq -c $FILE
but I get instead a print of the whole file and I can't understand what I am doing wrong.
EDIT: Now I get printed what I want as below:
1
2307 Internet Explorer
369 Safari
2785 Chrome
316 Opera
4182 Firefox
but I can't understand where this "1" come from
Upvotes: 1
Views: 2737
Reputation: 8406
A cut
based answer, (plus a bit of sed
to surround items with quotes, the better to make blank lines visible):
cut -d'|' -f8 "$FILE" | sed 's/.*/"&"/' | sort | uniq -c
Upvotes: 1
Reputation: 785276
You can just awk
to do this:
awk -F '|' '{freq[$8]++} END{for (i in freq) print freq[i], i}' file
This awk command uses |
as delimiter and uses an array seen
with key as $8
. When it finds a key $8
increments the frequency (value) by 1
.
Btw you need to add custom delimiter |
in your command and use it like this:
awk -F '|' '{print $8}' file | sort | uniq -c
Upvotes: 3
Reputation: 203684
Among other things, you're running uniq
on $FILE
instead of running awk
on $FILE
and piping the results to sort then uniq. You meant to write:
awk -F'|' '{print $8}' "$FILE" | sort | uniq -c
but all you need is one command:
awk -F'|' '{cnt[$8]++} END{for (key in cnt) print cnt[key], key}' "$FILE"
wrt I can't understand where this "1" come from
- you have 1 empty $8 in your input file. Maybe a blank line. You can find it with:
awk -F'|' '$8~/^[[:space:]]*$/{print NR, "$0=<"$0">, $8=<"$8">"}' "$FILE"
Upvotes: 2