Reputation: 391
I am using build_dict() function in word_count_directory() function to create a dictionary of word counts of three files in a directory. I want to create three dictionaries (one at a time for each file) and update previous dictionary. My code creates a single dictionary (word_count) that combining all three dictionaries at same time. I was wondering how to accomplish this?
def build_dict(filename):
f = open(filename, 'rU')
words = f.read().split()
count = {}
for word in words:
word = word.lower()
if word not in count:
count[word] = 1
else:
count[word] += 1
f.close()
return count
## print build_dict("C:\\Users\\Phil2040\\Desktop\\word_count\\news1.txt")
import os
import os.path
def word_count_directory(directory):
wordcount={}
filelist=[os.path.join(directory,f) for f in os.listdir(directory)]
for file in filelist:
wordcount=build_dict(file) # calling build_dict function
return wordcount
print word_count_directory("C:\\Users\\Phil2040\\Desktop\\Word_count")
Upvotes: 1
Views: 3327
Reputation: 15537
Use collections.Counter
.
Example files:
/tmp/foo.txt
hello world
hello world
foo bar
foo bar baz
/tmp/bar.txt
hello world
hello world
foo bar
foo bar baz
foo foo foo
You can create one Counter
per file, then add them together!
from collections import Counter
def word_count(filename):
with open(filename, 'r') as f:
c = Counter()
for line in f:
c.update(line.strip().split(' '))
return c
files = ['/tmp/foo.txt', '/tmp/bar.txt']
counters = [word_count(filename) for filename in files]
# counters content (example):
# [Counter({'world': 2, 'foo': 2, 'bar': 2, 'hello': 2, 'baz': 1}),
# Counter({'foo': 5, 'world': 2, 'bar': 2, 'hello': 2, 'baz': 1})]
# Add all the word counts together:
total = sum(counters, Counter()) # sum needs an empty counter to start with
# total content (example):
# Counter({'foo': 7, 'world': 4, 'bar': 4, 'hello': 4, 'baz': 2})
Upvotes: 3
Reputation: 471
def word_count_directory(directory):
filelist=[os.path.join(directory,f) for f in os.listdir(directory)]
return [build_dict(file) for file in filelist]
This will return a list of dictionary, one for each of your file.
If you want to get the wordcount of each file one after the other you can use a yield :
def word_count_directory(directory):
filelist=[os.path.join(directory,f) for f in os.listdir(directory)]
for file in filelist:
yield build_dict(file)
word_count_directory(".") # gets the wordcount of the first file
word_count_directory(".") # . . . the second file
For your first function you should take a look at the Counter class from the collections module.
Upvotes: 1