Count number of occurences of a words list in multiple text files

Question

I have a list of words :

words = ["hello","my","name"]
files = ["file1.txt","file2.txt"]

what i want is to count the number of occurences of every single word of the list in all text files.

My work so far:

import re 
occ = []
for file in files:
 try:
   fichier = open(file, encoding="utf-8")
 except:
   pass
 data = fichier.read()
 for wrd in words:
    count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
    occ.append(wrd + " : " + str(count))
 texto = open("occurence.txt", "w+b")
for ww in occ:
   texto.write(ww.encode("utf-8")+"
".encode("utf-8"))

So this code works fine with a single file but when i try a list of files it gives me only the result of the last file.

Amal K · Accepted Answer

Use a dictionary instead of a list:

import re 
occ = {} # Create an empty dictionary
words = ["hello", "my", "name"]
files = ["f1.txt", "f2.txt", "f3.txt" ]
for file in files:
 try:
   fichier = open(file, encoding="utf-8")
 except:
   pass
else:
 data = fichier.read()
 for wrd in words:
    count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
    if wrd in occ:
        occ[wrd] += count # If wrd is already in dictionary, increment occurrence count 
    else:
        occ[wrd] = count # Else add wrd to dictionary with occurrence count 
 
print(occ)

If you want it as a list of strings as in your question:

occ_list = [ f"{key} : {value}" for key, value in occ.items() ]

Count number of occurences of a words list in multiple text files

Answers (2)

Related Questions