ChingOwn
ChingOwn

Reputation: 37

counting lengths of the words in a .txt

I have seen similar questions but nothing that truly helped me. I need to read in a text file, split it, and count the lengths of the words. I am also trying to print them out in a table with the length of the word on the left and then the actual word on the right. My code is all screwed up right now cause I got to the point where I decided to ask for help.

a = open('owlcreek.txt').read().split()
lengths = dict()
for word in a:
    length = len(word)

if length not in lengths:
    for length, counter in lengths.items():
        print "Words of length %d: %d" % (length, counter)

#words=[line for line in a]
#print ("\n" .join(counts))

Also I guess I will need to write a little parser to get all the "!-- out. I tried to use The Counter, but I guess I don't know how to use it properly.

Upvotes: 1

Views: 159

Answers (2)

llb
llb

Reputation: 1741

A simple regular expression will suffice to clear out the punctuation and spaces.

edit: If I'm understanding your problem correctly, you want all the unique words in a text file, sorted by length. In which case:

import re
import itertools

with open('README.txt', 'r') as file:
    words = set(re.findall(r"\w+'\w+|\w+", file.read())) # discard duplicates
    sorted_words = sorted(words, key=len)

for length, words in itertools.groupby(sorted_words, len):
    words = list(words)
    print("Words of length {0}: {1}".format(length, len(words)))
    for word in words:
        print(word)

Upvotes: 0

Paulo Bu
Paulo Bu

Reputation: 29794

It should be like this:

a=open('owlcreek.txt').read().split()
lengths=dict()
for word in a:
    length = len(word)
    # if the key is not present, add it
    if not lengths.has_key(length):
        # the value should be the list of words
        lengths[length] = []
    # append the word to the list for length key
    lengths[length].append(word)

# print them out as length, count(words of that length)
for length, wrds in lengths.items():
    print "Words of length %d: %d" % (length, len(wrds))

Hope this helps!

Upvotes: 3

Related Questions