a3rxander
a3rxander

Reputation: 906

Read words from .txt, and count for each words

I wonder, how to read character string like fscanf. I need to read for word, in the all .txt . I need a count for each words.

collectwords = collections.defaultdict(int)

with open('DatoSO.txt', 'r') as filetxt:

for line in filetxt:
    v=""
    for char in line:
        if str(char) != " ":
          v=v+str(char)

        elif str(char) == " ":
          collectwords[v] += 1
          v=""

this way, I cant to read the last word.

Upvotes: 1

Views: 3020

Answers (3)

poke
poke

Reputation: 388413

Uhm, like this?

with open('DatoSO.txt', 'r') as filetxt:
    for line in filetxt:
        for word in line.split():
            collectwords[word] += 1

Upvotes: 3

JoshAdel
JoshAdel

Reputation: 68752

You might also consider using collections.counter if you are using Python >=2.7

http://docs.python.org/library/collections.html#collections.Counter

It adds a number of methods like 'most_common', which might be useful in this type of application.

From Doug Hellmann's PyMOTW:

import collections

c = collections.Counter()
with open('/usr/share/dict/words', 'rt') as f:
    for line in f:
        c.update(line.rstrip().lower())

print 'Most common:'
for letter, count in c.most_common(3):
    print '%s: %7d' % (letter, count)

http://www.doughellmann.com/PyMOTW/collections/counter.html -- although this does letter counts instead of word counts. In the c.update line, you would want to replace line.rstrip().lower with line.split() and perhaps some code to get rid of punctuation.

Edit: To remove punctuation here is probably the fastest solution:

import collections
import string

c = collections.Counter()
with open('DataSO.txt', 'rt') as f:
    for line in f:
        c.update(line.translate(string.maketrans("",""), string.punctuation).split())

(borrowed from the following question Best way to strip punctuation from a string in Python)

Upvotes: 3

Artfunkel
Artfunkel

Reputation: 1852

Python makes this easy:

collectwords = []
filetxt = open('DatoSO.txt', 'r')

for line in filetxt:
  collectwords.extend(line.split())

Upvotes: 1

Related Questions