ojblass
ojblass

Reputation: 21630

Code Golf 4th of July Edition: Counting Top Ten Occurring Words

Given the following list of presidents do a top ten word count in the smallest program possible:

INPUT FILE

    Washington
    Washington
    Adams
    Jefferson
    Jefferson
    Madison
    Madison
    Monroe
    Monroe
    John Quincy Adams
    Jackson
    Jackson
    Van Buren
    Harrison 
    DIES
    Tyler
    Polk
    Taylor 
    DIES
    Fillmore
    Pierce
    Buchanan
    Lincoln
    Lincoln 
    DIES
    Johnson
    Grant
    Grant
    Hayes
    Garfield 
    DIES
    Arthur
    Cleveland
    Harrison
    Cleveland
    McKinley
    McKinley
    DIES
    Teddy Roosevelt
    Teddy Roosevelt
    Taft
    Wilson
    Wilson
    Harding
    Coolidge
    Hoover
    FDR
    FDR
    FDR
    FDR
    Dies
    Truman
    Truman
    Eisenhower
    Eisenhower
    Kennedy 
    DIES
    Johnson
    Johnson
    Nixon
    Nixon 
    ABDICATES
    Ford
    Carter
    Reagan
    Reagan
    Bush
    Clinton
    Clinton
    Bush
    Bush
    Obama

To start it off in bash 97 characters

cat input.txt | tr " " "\n" | tr -d "\t " | sed 's/^$//g' | sort | uniq -c | sort -n | tail -n 10

Output:

      2 Nixon
      2 Reagan
      2 Roosevelt
      2 Truman
      2 Washington
      2 Wilson
      3 Bush
      3 Johnson
      4 FDR
      7 DIES

Break ties as you see fit! Happy fourth!

For those of you who care more information on presidents can be found here.

Upvotes: 11

Views: 1150

Answers (16)

Helen
Helen

Reputation: 97991

Windows Batch File

This is obviously not the smallest solution, but I decided to post it anyway, just for fun. :) NB: the batch file uses a temporary file named $ for storing temporary results.

Original uncompressed version with comments:

@echo off
setlocal enableextensions enabledelayedexpansion

set infile=%1
set cnt=%2
set tmpfile=$
set knownwords=

rem Calculate word count
for /f "tokens=*" %%i in (%infile%) do (
  for %%w in (%%i) do (

    rem If the word hasn't already been processed, ...
    echo !knownwords! | findstr "\<%%w\>" > nul
    if errorlevel 1 (

      rem Count the number of the word's occurrences and save it to a temp file
      for /f %%n in ('findstr "\<%%w\>" %infile% ^| find /v "" /c') do (
        echo %%n^|%%w >> %tmpfile%
      )

      rem Then add the word to the known words list
      set knownwords=!knownwords! %%w
    )
  )
)

rem Print top 10 word count
for /f %%i in ('sort /r %tmpfile%') do (
  echo %%i
  set /a cnt-=1
  if !cnt!==0 goto end
)

:end
del %tmpfile%

Compressed & obfuscated version, 317 characters:

@echo off&setlocal enableextensions enabledelayedexpansion&set n=%2&set l=
for /f "tokens=*" %%i in (%1)do for %%w in (%%i)do echo !l!|findstr "\<%%w\>">nul||for /f %%n in ('findstr "\<%%w\>" %1^|find /v "" /c')do echo %%n^|%%w>>$&set l=!l! %%w
for /f %%i in ('sort /r $')do echo %%i&set /a n-=1&if !n!==0 del $&exit /b

This can be shortened to 258 characters if echo is already off and command extensions and delayed variable expansion are on:

set n=%2&set l=
for /f "tokens=*" %%i in (%1)do for %%w in (%%i)do echo !l!|findstr "\<%%w\>">nul||for /f %%n in ('findstr "\<%%w\>" %1^|find /v "" /c')do echo %%n^|%%w>>$&set l=!l! %%w
for /f %%i in ('sort /r $')do echo %%i&set /a n-=1&if !n!==0 del $&exit /b

Usage:

> filename.bat input.txt 10 & pause

Output:

6|DIES
4|FDR
3|Johnson
3|Bush
2|Wilson
2|Washington
2|Truman
2|Roosevelt
2|Reagan
2|Nixon

Upvotes: 2

user4812
user4812

Reputation: 6732

Ruby

115 chars

w = File.read($*[0]).split
w.uniq.map{|x| [w.select{|y|x==y}.size,x]}.sort.last(10).each{|z| puts "#{z[1]} #{z[0]}"}

Upvotes: 2

matyr
matyr

Reputation: 5774

Ruby 66B

puts (a=$<.read.split).uniq.map{|x|"#{a.count x} "+x}.sort.last 10

Upvotes: 2

ephemient
ephemient

Reputation: 204964

Haskell, 102 characters (wow, so close to matching the original):

import List
(take 10.map snd.sort.map(\(x:y)->(-length y,x)).group.sort.words)`fmap`readFile"input.txt"

J, only 55 characters:

10{.\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'

(I've yet to figure out how to elegantly perform text manipulations in J... it's much better at array-structured data.)


   NB. read the file
   <1!:1<'input.txt'
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...
|    Washington     Washington     Adams     Jefferson     Jefferson     Madison     Madison     Monroe     Monroe     John Quincy Adams     Jackson     Jackson     Van Buren     Harrison DIES     Tyler     Polk     Taylor DIES     Fillmore     Pierce     ...
+---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------...
   NB. split into lines
   <;._2[1!:1<'input.txt'
+--------------+--------------+---------+-------------+-------------+-----------+-----------+----------+----------+---------------------+-----------+-----------+-------------+-----------------+---------+--------+---------------+------------+----------+----...
|    Washington|    Washington|    Adams|    Jefferson|    Jefferson|    Madison|    Madison|    Monroe|    Monroe|    John Quincy Adams|    Jackson|    Jackson|    Van Buren|    Harrison DIES|    Tyler|    Polk|    Taylor DIES|    Fillmore|    Pierce|    ...
+--------------+--------------+---------+-------------+-------------+-----------+-----------+----------+----------+---------------------+-----------+-----------+-------------+-----------------+---------+--------+---------------+------------+----------+----...
   NB. split into words
   ;;:&.><;._2[1!:1<'input.txt'
+----------+----------+-----+---------+---------+-------+-------+------+------+----+------+-----+-------+-------+---+-----+--------+----+-----+----+------+----+--------+------+--------+-------+-------+----+-------+-----+-----+-----+--------+----+------+---...
|Washington|Washington|Adams|Jefferson|Jefferson|Madison|Madison|Monroe|Monroe|John|Quincy|Adams|Jackson|Jackson|Van|Buren|Harrison|DIES|Tyler|Polk|Taylor|DIES|Fillmore|Pierce|Buchanan|Lincoln|Lincoln|DIES|Johnson|Grant|Grant|Hayes|Garfield|DIES|Arthur|Cle...
+----------+----------+-----+---------+---------+-------+-------+------+------+----+------+-----+-------+-------+---+-----+--------+----+-----+----+------+----+--------+------+--------+-------+-------+----+-------+-----+-----+-----+--------+----+------+---...
   NB. count reptititions
   |:~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
|2         |2    |2        |2      |2     |1   |1     |2      |1  |1    |2       |6   |1    |1   |1     |1       |1     |1       |2      |3      |2    |1    |1       |1     |2        |2       |2        |1   |2     |1      |1       |1     |4  |2     |2     ...
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
|Washington|Adams|Jefferson|Madison|Monroe|John|Quincy|Jackson|Van|Buren|Harrison|DIES|Tyler|Polk|Taylor|Fillmore|Pierce|Buchanan|Lincoln|Johnson|Grant|Hayes|Garfield|Arthur|Cleveland|McKinley|Roosevelt|Taft|Wilson|Harding|Coolidge|Hoover|FDR|Truman|Eisenh...
+----------+-----+---------+-------+------+----+------+-------+---+-----+--------+----+-----+----+------+--------+------+--------+-------+-------+-----+-----+--------+------+---------+--------+---------+----+------+-------+--------+------+---+------+------...
   NB. sort
   |:\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
|6   |4  |3      |3   |2     |2         |2     |2        |2     |2    |2     |2       |2      |2      |2        |2      |2       |2    |2         |2      |2        |2    |1  |1    |1     |1   |1     |1   |1     |1    |1      |1   |1     |1    |1      |1   ...
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
|DIES|FDR|Johnson|Bush|Wilson|Washington|Truman|Roosevelt|Reagan|Nixon|Monroe|McKinley|Madison|Lincoln|Jefferson|Jackson|Harrison|Grant|Eisenhower|Clinton|Cleveland|Adams|Van|Tyler|Taylor|Taft|Quincy|Polk|Pierce|Obama|Kennedy|John|Hoover|Hayes|Harding|Garf...
+----+---+-------+----+------+----------+------+---------+------+-----+------+--------+-------+-------+---------+-------+--------+-----+----------+-------+---------+-----+---+-----+------+----+------+----+------+-----+-------+----+------+-----+-------+----...
   NB. take 10
   10{.\:~~.(,.~[:<"0@(+/)=/~);;:&.><;._2[1!:1<'input.txt'
+-+----------+
|6|DIES      |
+-+----------+
|4|FDR       |
+-+----------+
|3|Johnson   |
+-+----------+
|3|Bush      |
+-+----------+
|2|Wilson    |
+-+----------+
|2|Washington|
+-+----------+
|2|Truman    |
+-+----------+
|2|Roosevelt |
+-+----------+
|2|Reagan    |
+-+----------+
|2|Nixon     |
+-+----------+

Upvotes: 5

Brad Gilbert
Brad Gilbert

Reputation: 34130

Perl 86 characters

94, if you count the input filename.

perl -anE'$_{$_}++for@F;END{say"$_{$_} $_"for@{[sort{$_{$b}<=>$_{$a}}keys%_]}[0..10]}' test.in

If you don't care how many results you get, then it's only 75, excluding the filename.

perl -anE'$_{$_}++for@F;END{say"$_{$_} $_"for sort{$_{$b}<=>$_{$a}}keys%_}' test.in

Upvotes: 2

Nick Presta
Nick Presta

Reputation: 28705

The lack of AWK is disturbing.

xargs -n1<input.txt|awk '{c[$1]++}END{for(p in c)print c[p],p|"sort|tail"}'

75 characters.

If you want to get a bit more AWKy, you can forget xargs:

awk -v RS='[^a-zA-Z]' /./'{c[$1]++}END{for(p in c)print c[p],p|"sort|tail"}' input.txt

Upvotes: 3

jason
jason

Reputation: 241771

C#, 153:

Reads in the file at p and prints results to the console:

File.ReadLines(p)
    .SelectMany(s=>s.Split(' '))
    .GroupBy(w=>w)
    .OrderBy(g=>-g.Count())
    .Take(10)
    .ToList()
    .ForEach(g=>Console.WriteLine(g.Count()+"|"+g.Key));

If merely producing the list but not printing to the console, it's 93 characters.

6|DIES
4|FDR
3|Johnson
3|Bush
2|Washington
2|Adams
2|Jefferson
2|Madison
2|Monroe
2|Jackson

Upvotes: 12

SilentGhost
SilentGhost

Reputation: 319899

python 3.1 (88 chars)

import collections
collections.Counter(open('input.txt').read().split()).most_common(10)

Upvotes: 2

Josef Pfleger
Josef Pfleger

Reputation: 74557

Vim 36

:%s/\W/\r/g|%!sort|uniq -c|sort|tail

Upvotes: 7

mthurlin
mthurlin

Reputation: 27305

Python 2.6, 104 chars:

l=open("input.txt").read().split()
for c,n in sorted(set((l.count(w),w) for w in l if w))[-10:]:print c,n

Upvotes: 2

Stig Brautaset
Stig Brautaset

Reputation: 2632

A shorter shell version:

xargs -n1 < input.txt | sort | uniq -c | sort -nr | head

If you want case insensitive ranking, change uniq -c into uniq -ci.

Slightly shorter still, if you're happy about the rank being reversed and readability impaired by lack of spaces. This clocks in at 46 characters:

xargs -n1<input.txt|sort|uniq -c|sort -n|tail

(You could strip this down to 38 if you were allowed to rename the input file to simply "i" first.)

Observing that, in this special case, no word occur more than 9 times we can shave off 3 more characters by dropping the '-n' argument from the final sort:

xargs -n1<input.txt|sort|uniq -c|sort|tail

That takes this solution down to 43 characters without renaming the input file. (Or 35, if you do.)

Using xargs -n1 to split the file into one word on each line is preferable to the tr \ \\n solution, as that creates lots of blank lines. This means that the solution is not correct, because it misses out Nixon and shows a blank string showing up 256 times. However, a blank string is not a "word".

Upvotes: 11

ojblass
ojblass

Reputation: 21630

vim 38 and works for all input

:%!xargs -n1|sort|uniq -c|sort -n|tail

Upvotes: 2

Jonathan Leffler
Jonathan Leffler

Reputation: 754820

Here's a compressed version of the shell script, observing that for a reasonable interpretation of the input data (no leading or trailing blanks) that the second 'tr' and the 'sed' command in the original do not change the data (verified by inserting 'tee out.N' at suitable points and checking the output file sizes - identical). The shell needs fewer spaces than humans do - and using cat instead of input I/O redirection wastes space.

tr \  \\n<input.txt|sort|uniq -c|sort -n|tail -10

This weighs in at 50 characters including newline at end of script.

With two more observations (pulled from other people's answers):

  1. tail on its own is equivalent to 'tail -10', and
  2. in this case, numeric and alpha sorting are equivalent,

this can be shrunk by a further 7 characters (to 43 including trailing newline):

tr \  \\n<input.txt|sort|uniq -c|sort|tail

Using 'xargs -n1' (with no command prefix given) instead of 'tr' is extremely clever; it deals with leading, trailing and multiple embedded spaces (which this solution does not).

Upvotes: 2

user127239
user127239

Reputation:

My best try with ruby so far, 166 chars:

h = Hash.new
File.open('f.l').each_line{|l|l.split(/ /).each{|e|h[e]==nil ?h[e]=1:h[e]+=1}}
h.sort{|a,b|a[1]<=>b[1]}.last(10).each{|e|puts"#{e[1]} #{e[0]}"}

I am surprised that no one has posted a crazy J solution yet.

Upvotes: 2

Alan Haggai Alavi
Alan Haggai Alavi

Reputation: 74272

Perl: 90

Perl: 114 (Including perl, command-line switches, single quotes and filename)

perl -nle'$h{$_}++for split/ /;END{$i++<=10?print"$h{$_} $_":0for reverse sort{$h{$a}cmp$h{$b}}keys%h}' input.txt

Upvotes: 3

ojblass
ojblass

Reputation: 21630

vim 60

    :1,$!tr " " "\n"|tr -d "\t "|sort|uniq -c|sort -n|tail -n 10

Upvotes: 7

Related Questions