Reputation: 1141
Given an input file containing one single number per line, how could I get a count of how many times an item occurred in that file?
cat input.txt
1
2
1
3
1
0
desired output (=>[1,3,1,1]):
cat output.txt
0 1
1 3
2 1
3 1
It would be great, if the solution could also be extended for floating numbers.
Upvotes: 47
Views: 47070
Reputation: 73
I had a similar problem as described, but across gigabytes of gzip'd log files. Because many of these solutions necessitated waiting until all the data was parsed, I opted to write rare to quickly parse and aggregate data based on a regexp.
In the case above, it's as simple as passing in the data to the histogram function:
rare histo input.txt
# OR
cat input.txt | rare histo
# Outputs:
1 3
0 1
2 1
3 1
But it can also handle more complex cases via regex/expressions, such as:
rare histo --match "(\d+)" --extract "{1}" input.txt
Upvotes: 0
Reputation: 3451
perl -lne '$h{$_}++; END{for $n (sort keys %h) {print "$n\t$h{$n}"}}' input.txt
Loop over each line with -n
Each $_
number increments hash %h
Once the END
of input.txt
has been reached,
sort {$a <=> $b}
the hash numerically
Print the number $n
and the frequency $h{$n}
Similar code which works on floating point:
perl -lne '$h{int($_)}++; END{for $n (sort {$a <=> $b} keys %h) {print "$n\t$h{$n}"}}' float.txt
float.txt
1.732
2.236
1.442
3.162
1.260
0.707
output:
0 1
1 3
2 1
3 1
Upvotes: 1
Reputation: 8406
Using maphimbu
from the Debian stda package:
# use 'jot' to generate 100 random numbers between 1 and 5
# and 'maphimbu' to print sorted "histogram":
jot -r 100 1 5 | maphimbu -s 1
Output:
1 20
2 21
3 20
4 21
5 18
maphimbu
also works with floating point:
jot -r 100.0 10 15 | numprocess /%10/ | maphimbu -s 1
Output:
1 21
1.1 17
1.2 14
1.3 18
1.4 11
1.5 19
Upvotes: 3
Reputation: 95612
In addition to the other answers, you can use awk to make a simple graph. (But, again, it's not a histogram.)
Upvotes: 1
Reputation: 15118
At least some of that can be done with
sort output.txt | uniq -c
But the order number count
is reversed. This will fix that problem.
sort test.dat | uniq -c | awk '{print $2, $1}'
Upvotes: 4
Reputation: 246867
Another option:
awk '{n[$1]++} END {for (i in n) print i,n[i]}' input.txt | sort -n > output.txt
Upvotes: 12
Reputation: 5438
You mean you want a count of how many times an item appears in the input file? First sort it (using -n
if the input is always numbers as in your example) then count the unique results.
sort -n input.txt | uniq -c
Upvotes: 90