Displaying the Top 10 words in a string

Question

I am writing a program that grabs a txt file off the internet and reads it. It then displays a bunch of data related to that txt file. Now, this all works well, until we get to the end. The last thing I want to do is display the top 10 most frequent words used in the txt file. The code I have right now only displays the most frequent word 10 times. Can someone look at this and tell me what the problem is? The only part you have to look at is the last part.

import urllib
open = urllib.urlopen("http://www.textfiles.com/etext/FICTION/alice30.txt").read()

v = str(open)                 # this variable makes the file a string
strip = v.replace(" ", "")        # this trims spaces
char = len(strip)    # this variable counts the number of characters in the string
ch = v.splitlines()    # this variable seperates the lines

line = len(ch)         # this counts the number of lines


print "Here's the number of lines in your file:", line

wordz = v.split()
print wordz

print "Here's the number of characters in your file:", char

spaces = v.count(' ')

words = ''.join(c if c.isalnum() else ' ' for c in v).split()

words = len(words)

print "Here's the number of words in your file:", words

topten = map(lambda x:filter(str.isalpha,x.lower()),v.split())
print "
".join(sorted(words,key=words.count)[-10:][::-1])

Padraic Cunningham · Accepted Answer

Use collections.Counter to count all the words, Counter.most_common(10) will return the ten most common words and their count

wordz = v.split()
from collections import Counter
c = Counter(wordz)
print(c.most_common(10))

Using with to open the file and get a count of all the words in the txt file:

from collections import Counter
with open("http://www.textfiles.com/etext/FICTION/alice30.txt") as f:
    c = Counter()
    for line in f:
        c.update(line.split()) # Counter.update adds the values 
print(c.most_common(10))

To get total characters in the file get the sum of length of each key multiplied by the times it appears:

print(sum(len(k)*v for k,v in c.items()))

To get the word count:

print(sum(c.values()))

Displaying the Top 10 words in a string

Answers (1)

Related Questions