Reputation: 21630
Given the following list of presidents do a top ten word count in the smallest program possible:
INPUT FILE
Washington Washington Adams Jefferson Jefferson Madison Madison Monroe Monroe John Quincy Adams Jackson Jackson Van Buren Harrison DIES Tyler Polk Taylor DIES Fillmore Pierce Buchanan Lincoln Lincoln DIES Johnson Grant Grant Hayes Garfield DIES Arthur Cleveland Harrison Cleveland McKinley McKinley DIES Teddy Roosevelt Teddy Roosevelt Taft Wilson Wilson Harding Coolidge Hoover FDR FDR FDR FDR Dies Truman Truman Eisenhower Eisenhower Kennedy DIES Johnson Johnson Nixon Nixon ABDICATES Ford Carter Reagan Reagan Bush Clinton Clinton Bush Bush Obama
To start it off in bash 97 characters
cat input.txt | tr " " "\n" | tr -d "\t " | sed 's/^$//g' | sort | uniq -c | sort -n | tail -n 10
Output:
2 Nixon 2 Reagan 2 Roosevelt 2 Truman 2 Washington 2 Wilson 3 Bush 3 Johnson 4 FDR 7 DIES
Break ties as you see fit! Happy fourth!
For those of you who care more information on presidents can be found here.
Upvotes: 11
Views: 1150
Reputation: 97991
This is obviously not the smallest solution, but I decided to post it anyway, just for fun. :) NB: the batch file uses a temporary file named $ for storing temporary results.
Original uncompressed version with comments:
@echo off
setlocal enableextensions enabledelayedexpansion
set infile=%1
set cnt=%2
set tmpfile=$
set knownwords=
rem Calculate word count
for /f "tokens=*" %%i in (%infile%) do (
for %%w in (%%i) do (
rem If the word hasn't already been processed, ...
echo !knownwords! | findstr "\<%%w\>" > nul
if errorlevel 1 (
rem Count the number of the word's occurrences and save it to a temp file
for /f %%n in ('findstr "\<%%w\>" %infile% ^| find /v "" /c') do (
echo %%n^|%%w >> %tmpfile%
)
rem Then add the word to the known words list
set knownwords=!knownwords! %%w
)
)
)
rem Print top 10 word count
for /f %%i in ('sort /r %tmpfile%') do (
echo %%i
set /a cnt-=1
if !cnt!==0 goto end
)
:end
del %tmpfile%
Compressed & obfuscated version, 317 characters:
@echo off&setlocal enableextensions enabledelayedexpansion&set n=%2&set l=
for /f "tokens=*" %%i in (%1)do for %%w in (%%i)do echo !l!|findstr "\<%%w\>">nul||for /f %%n in ('findstr "\<%%w\>" %1^|find /v "" /c')do echo %%n^|%%w>>$&set l=!l! %%w
for /f %%i in ('sort /r $')do echo %%i&set /a n-=1&if !n!==0 del $&exit /b
This can be shortened to 258 characters if echo is already off and command extensions and delayed variable expansion are on:
set n=%2&set l=
for /f "tokens=*" %%i in (%1)do for %%w in (%%i)do echo !l!|findstr "\<%%w\>">nul||for /f %%n in ('findstr "\<%%w\>" %1^|find /v "" /c')do echo %%n^|%%w>>$&set l=!l! %%w
for /f %%i in ('sort /r $')do echo %%i&set /a n-=1&if !n!==0 del $&exit /b
Usage:
> filename.bat input.txt 10 & pause
Output:
6|DIES
4|FDR
3|Johnson
3|Bush
2|Wilson
2|Washington
2|Truman
2|Roosevelt
2|Reagan
2|Nixon
Upvotes: 2
Reputation: 6732
Ruby
115 chars
w = File.read($*[0]).split
w.uniq.map{|x| [w.select{|y|x==y}.size,x]}.sort.last(10).each{|z| puts "#{z[1]} #{z[0]}"}
Upvotes: 2
Reputation: 5774
Ruby 66B
puts (a=$<.read.split).uniq.map{|x|"#{a.count x} "+x}.sort.last 10
Upvotes: 2
Reputation: 204964
Haskell, 102 characters (wow, so close to matching the original):
import List
(take 10.map snd.sort.map(\(x:y)->(-length y,x)).group.sort.words)`fmap`readFile"input.txt"
J, only 55 characters:
10{.\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
(I've yet to figure out how to elegantly perform text manipulations in J... it's much better at array-structured data.)
NB. read the file <1!:1<'input.txt' +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------... | Washington Washington Adams Jefferson Jefferson Madison Madison Monroe Monroe John Quincy Adams Jackson Jackson Van Buren Harrison DIES Tyler Polk Taylor DIES Fillmore Pierce ... +---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------... NB. split into lines <;._2[1!:1<'input.txt' +--------------+--------------+---------+-------------+-------------+-----------+-----------+----------+----------+---------------------+-----------+-----------+-------------+-----------------+---------+--------+---------------+------------+----------+----... | Washington| Washington| Adams| Jefferson| Jefferson| Madison| Madison| Monroe| Monroe| John Quincy Adams| Jackson| Jackson| Van Buren| Harrison DIES| Tyler| Polk| Taylor DIES| Fillmore| Pierce| ... +--------------+--------------+---------+-------------+-------------+-----------+-----------+----------+----------+---------------------+-----------+-----------+-------------+-----------------+---------+--------+---------------+------------+----------+----... NB. split into words ;;:&.><;._2[1!:1<'input.txt' +----------+----------+-----+---------+---------+-------+-------+------+------+----+------+-----+-------+-------+---+-----+--------+----+-----+----+------+----+--------+------+--------+-------+-------+----+-------+-----+-----+-----+--------+----+------+---... |Washington|Washington|Adams|Jefferson|Jefferson|Madison|Madison|Monroe|Monroe|John|Quincy|Adams|Jackson|Jackson|Van|Buren|Harrison|DIES|Tyler|Polk|Taylor|DIES|Fillmore|Pierce|Buchanan|Lincoln|Lincoln|DIES|Johnson|Grant|Grant|Hayes|Garfield|DIES|Arthur|Cle... +----------+----------+-----+---------+---------+-------+-------+------+------+----+------+-----+-------+-------+---+-----+--------+----+-----+----+------+----+--------+------+--------+-------+-------+----+-------+-----+-----+-----+--------+----+------+---... NB. count reptititions |:~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt' +----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------... |2 |2 |2 |2 |2 |1 |1 |2 |1 |1 |2 |6 |1 |1 |1 |1 |1 |1 |2 |3 |2 |1 |1 |1 |2 |2 |2 |1 |2 |1 |1 |1 |4 |2 |2 ... +----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------... |Washington|Adams|Jefferson|Madison|Monroe|John|Quincy|Jackson|Van|Buren|Harrison|DIES|Tyler|Polk|Taylor|Fillmore|Pierce|Buchanan|Lincoln|Johnson|Grant|Hayes|Garfield|Arthur|Cleveland|McKinley|Roosevelt|Taft|Wilson|Harding|Coolidge|Hoover|FDR|Truman|Eisenh... +----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------... NB. sort |:\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt' +----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----... |6 |4 |3 |3 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |2 |1 |1 |1 |1 |1 |1 |1 |1 |1 |1 |1 |1 |1 |1 ... +----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----... |DIES|FDR|Johnson|Bush|Wilson|Washington|Truman|Roosevelt|Reagan|Nixon|Monroe|McKinley|Madison|Lincoln|Jefferson|Jackson|Harrison|Grant|Eisenhower|Clinton|Cleveland|Adams|Van|Tyler|Taylor|Taft|Quincy|Polk|Pierce|Obama|Kennedy|John|Hoover|Hayes|Harding|Garf... +----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----... NB. take 10 10{.\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt' +-+----------+ |6|DIES | +-+----------+ |4|FDR | +-+----------+ |3|Johnson | +-+----------+ |3|Bush | +-+----------+ |2|Wilson | +-+----------+ |2|Washington| +-+----------+ |2|Truman | +-+----------+ |2|Roosevelt | +-+----------+ |2|Reagan | +-+----------+ |2|Nixon | +-+----------+
Upvotes: 5
Reputation: 34130
Perl
86 characters94, if you count the input filename.
perl -anE'$_{$_}++for@F;END{say"$_{$_} $_"for@{[sort{$_{$b}<=>$_{$a}}keys%_]}[0..10]}' test.in
If you don't care how many results you get, then it's only 75, excluding the filename.
perl -anE'$_{$_}++for@F;END{say"$_{$_} $_"for sort{$_{$b}<=>$_{$a}}keys%_}' test.in
Upvotes: 2
Reputation: 28705
The lack of AWK is disturbing.
xargs -n1<input.txt|awk '{c[$1]++}END{for(p in c)print c[p],p|"sort|tail"}'
75 characters.
If you want to get a bit more AWKy, you can forget xargs:
awk -v RS='[^a-zA-Z]' /./'{c[$1]++}END{for(p in c)print c[p],p|"sort|tail"}' input.txt
Upvotes: 3
Reputation: 241771
C#, 153:
Reads in the file at p
and prints results to the console:
File.ReadLines(p)
.SelectMany(s=>s.Split(' '))
.GroupBy(w=>w)
.OrderBy(g=>-g.Count())
.Take(10)
.ToList()
.ForEach(g=>Console.WriteLine(g.Count()+"|"+g.Key));
If merely producing the list but not printing to the console, it's 93 characters.
6|DIES
4|FDR
3|Johnson
3|Bush
2|Washington
2|Adams
2|Jefferson
2|Madison
2|Monroe
2|Jackson
Upvotes: 12
Reputation: 319899
python 3.1 (88 chars)
import collections
collections.Counter(open('input.txt').read().split()).most_common(10)
Upvotes: 2
Reputation: 27305
Python 2.6, 104 chars:
l=open("input.txt").read().split()
for c,n in sorted(set((l.count(w),w) for w in l if w))[-10:]:print c,n
Upvotes: 2
Reputation: 2632
A shorter shell version:
xargs -n1 < input.txt | sort | uniq -c | sort -nr | head
If you want case insensitive ranking, change uniq -c
into uniq -ci
.
Slightly shorter still, if you're happy about the rank being reversed and readability impaired by lack of spaces. This clocks in at 46 characters:
xargs -n1<input.txt|sort|uniq -c|sort -n|tail
(You could strip this down to 38 if you were allowed to rename the input file to simply "i" first.)
Observing that, in this special case, no word occur more than 9 times we can shave off 3 more characters by dropping the '-n' argument from the final sort:
xargs -n1<input.txt|sort|uniq -c|sort|tail
That takes this solution down to 43 characters without renaming the input file. (Or 35, if you do.)
Using xargs -n1
to split the file into one word on each line is preferable to the tr \ \\n
solution, as that creates lots of blank lines. This means that the solution is not correct, because it misses out Nixon and shows a blank string showing up 256 times. However, a blank string is not a "word".
Upvotes: 11
Reputation: 21630
vim 38 and works for all input
:%!xargs -n1|sort|uniq -c|sort -n|tail
Upvotes: 2
Reputation: 754820
Here's a compressed version of the shell script, observing that for a reasonable interpretation of the input data (no leading or trailing blanks) that the second 'tr' and the 'sed' command in the original do not change the data (verified by inserting 'tee out.N' at suitable points and checking the output file sizes - identical). The shell needs fewer spaces than humans do - and using cat instead of input I/O redirection wastes space.
tr \ \\n<input.txt|sort|uniq -c|sort -n|tail -10
This weighs in at 50 characters including newline at end of script.
With two more observations (pulled from other people's answers):
tail
on its own is equivalent to 'tail -10
', andthis can be shrunk by a further 7 characters (to 43 including trailing newline):
tr \ \\n<input.txt|sort|uniq -c|sort|tail
Using 'xargs -n1
' (with no command prefix given) instead of 'tr
' is extremely clever; it deals with leading, trailing and multiple embedded spaces (which this solution does not).
Upvotes: 2
Reputation:
My best try with ruby so far, 166 chars:
h = Hash.new
File.open('f.l').each_line{|l|l.split(/ /).each{|e|h[e]==nil ?h[e]=1:h[e]+=1}}
h.sort{|a,b|a[1]<=>b[1]}.last(10).each{|e|puts"#{e[1]} #{e[0]}"}
I am surprised that no one has posted a crazy J solution yet.
Upvotes: 2
Reputation: 74272
Perl: 90
Perl: 114 (Including perl, command-line switches, single quotes and filename)
perl -nle'$h{$_}++for split/ /;END{$i++<=10?print"$h{$_} $_":0for reverse sort{$h{$a}cmp$h{$b}}keys%h}' input.txt
Upvotes: 3
Reputation: 21630
vim 60
:1,$!tr " " "\n"|tr -d "\t "|sort|uniq -c|sort -n|tail -n 10
Upvotes: 7