Reputation:
I have a list of words :
words = ["hello","my","name"]
files = ["file1.txt","file2.txt"]
what i want is to count the number of occurences of every single word of the list in all text files.
My work so far:
import re
occ = []
for file in files:
try:
fichier = open(file, encoding="utf-8")
except:
pass
data = fichier.read()
for wrd in words:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
occ.append(wrd + " : " + str(count))
texto = open("occurence.txt", "w+b")
for ww in occ:
texto.write(ww.encode("utf-8")+"\n".encode("utf-8"))
So this code works fine with a single file but when i try a list of files it gives me only the result of the last file.
Upvotes: 2
Views: 71
Reputation: 4874
Use a dictionary instead of a list:
import re
occ = {} # Create an empty dictionary
words = ["hello", "my", "name"]
files = ["f1.txt", "f2.txt", "f3.txt" ]
for file in files:
try:
fichier = open(file, encoding="utf-8")
except:
pass
else:
data = fichier.read()
for wrd in words:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
if wrd in occ:
occ[wrd] += count # If wrd is already in dictionary, increment occurrence count
else:
occ[wrd] = count # Else add wrd to dictionary with occurrence count
print(occ)
If you want it as a list of strings as in your question:
occ_list = [ f"{key} : {value}" for key, value in occ.items() ]
Upvotes: 0
Reputation: 82765
Use json
to store the count.
Ex:
import json
# Read Json
with open('data_store.json') as jfile:
data = json.load(jfile)
for wrd in words:
count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
if wrd not in data:
data[wrd] = 0
data[wrd] += count # Increment Count
# Write Result to JSON
with open('data_store.json', "w") as jfile:
json.dump(data, jfile)
Upvotes: 1