JimS
JimS

Reputation: 349

frequency count for file column in bash

I have a file with 8 columns using | as a delimiter and I want to count the occurence frequency of the words in the 8th column. I tried awk like this

awk -F '{print $8}' | sort | uniq -c $FILE 

but I get instead a print of the whole file and I can't understand what I am doing wrong.

EDIT: Now I get printed what I want as below:

1  
2307 Internet Explorer       
369 Safari  
2785 Chrome  
316 Opera  
4182 Firefox  

but I can't understand where this "1" come from

Upvotes: 1

Views: 2737

Answers (3)

agc
agc

Reputation: 8406

A cut based answer, (plus a bit of sed to surround items with quotes, the better to make blank lines visible):

cut -d'|' -f8 "$FILE" | sed 's/.*/"&"/' | sort | uniq -c

Upvotes: 1

anubhava
anubhava

Reputation: 785276

You can just awk to do this:

awk -F '|' '{freq[$8]++} END{for (i in freq) print freq[i], i}' file

This awk command uses | as delimiter and uses an array seen with key as $8. When it finds a key $8 increments the frequency (value) by 1. Btw you need to add custom delimiter | in your command and use it like this:

awk -F '|' '{print $8}' file | sort | uniq -c

Upvotes: 3

Ed Morton
Ed Morton

Reputation: 203684

Among other things, you're running uniq on $FILE instead of running awk on $FILE and piping the results to sort then uniq. You meant to write:

awk -F'|' '{print $8}' "$FILE" | sort | uniq -c

but all you need is one command:

awk -F'|' '{cnt[$8]++} END{for (key in cnt) print cnt[key], key}' "$FILE"

wrt I can't understand where this "1" come from - you have 1 empty $8 in your input file. Maybe a blank line. You can find it with:

awk -F'|' '$8~/^[[:space:]]*$/{print NR, "$0=<"$0">, $8=<"$8">"}' "$FILE"

Upvotes: 2

Related Questions