Javier
Javier

Reputation: 1141

generating frequency table from file

Given an input file containing one single number per line, how could I get a count of how many times an item occurred in that file?

cat input.txt
1
2
1
3
1
0

desired output (=>[1,3,1,1]):

cat output.txt
0 1
1 3
2 1
3 1

It would be great, if the solution could also be extended for floating numbers.

Upvotes: 47

Views: 47070

Answers (7)

zix99
zix99

Reputation: 73

I had a similar problem as described, but across gigabytes of gzip'd log files. Because many of these solutions necessitated waiting until all the data was parsed, I opted to write rare to quickly parse and aggregate data based on a regexp.

In the case above, it's as simple as passing in the data to the histogram function:

rare histo input.txt
# OR
cat input.txt | rare histo

# Outputs:
1                   3         
0                   1         
2                   1         
3                   1

But it can also handle more complex cases via regex/expressions, such as:

rare histo --match "(\d+)" --extract "{1}" input.txt

Upvotes: 0

Chris Koknat
Chris Koknat

Reputation: 3451

perl -lne '$h{$_}++; END{for $n (sort keys %h) {print "$n\t$h{$n}"}}' input.txt

Loop over each line with -n
Each $_ number increments hash %h
Once the END of input.txt has been reached,
sort {$a <=> $b} the hash numerically
Print the number $n and the frequency $h{$n}

Similar code which works on floating point:

perl -lne '$h{int($_)}++; END{for $n (sort {$a <=> $b} keys %h) {print "$n\t$h{$n}"}}' float.txt

float.txt

1.732
2.236
1.442
3.162
1.260
0.707

output:

0       1
1       3
2       1
3       1

Upvotes: 1

agc
agc

Reputation: 8406

Using maphimbu from the Debian stda package:

# use 'jot' to generate 100 random numbers between 1 and 5
# and 'maphimbu' to print sorted "histogram":
jot -r 100 1 5 | maphimbu -s 1

Output:

             1                20
             2                21
             3                20
             4                21
             5                18

maphimbu also works with floating point:

jot -r 100.0 10 15 | numprocess /%10/ | maphimbu -s 1

Output:

             1                21
           1.1                17
           1.2                14
           1.3                18
           1.4                11
           1.5                19

Upvotes: 3

In addition to the other answers, you can use awk to make a simple graph. (But, again, it's not a histogram.)

Upvotes: 1

pavium
pavium

Reputation: 15118

At least some of that can be done with

sort output.txt | uniq -c

But the order number count is reversed. This will fix that problem.

sort test.dat | uniq -c | awk '{print $2, $1}'

Upvotes: 4

glenn jackman
glenn jackman

Reputation: 246867

Another option:

awk '{n[$1]++} END {for (i in n) print i,n[i]}' input.txt | sort -n > output.txt

Upvotes: 12

Caleb
Caleb

Reputation: 5438

You mean you want a count of how many times an item appears in the input file? First sort it (using -n if the input is always numbers as in your example) then count the unique results.

sort -n input.txt | uniq -c

Upvotes: 90

Related Questions