user16085212
user16085212

Reputation:

Count number of occurences of a words list in multiple text files

I have a list of words :

words = ["hello","my","name"]
files = ["file1.txt","file2.txt"]

what i want is to count the number of occurences of every single word of the list in all text files.

My work so far:

import re 
occ = []
for file in files:
 try:
   fichier = open(file, encoding="utf-8")
 except:
   pass
 data = fichier.read()
 for wrd in words:
    count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
    occ.append(wrd + " : " + str(count))
 texto = open("occurence.txt", "w+b")
for ww in occ:
   texto.write(ww.encode("utf-8")+"\n".encode("utf-8"))

So this code works fine with a single file but when i try a list of files it gives me only the result of the last file.

Upvotes: 2

Views: 71

Answers (2)

Amal K
Amal K

Reputation: 4874

Use a dictionary instead of a list:

import re 
occ = {} # Create an empty dictionary
words = ["hello", "my", "name"]
files = ["f1.txt", "f2.txt", "f3.txt" ]
for file in files:
 try:
   fichier = open(file, encoding="utf-8")
 except:
   pass
else:
 data = fichier.read()
 for wrd in words:
    count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
    if wrd in occ:
        occ[wrd] += count # If wrd is already in dictionary, increment occurrence count 
    else:
        occ[wrd] = count # Else add wrd to dictionary with occurrence count 
 
print(occ)

If you want it as a list of strings as in your question:

occ_list = [ f"{key} : {value}" for key, value in occ.items() ]

Upvotes: 0

Rakesh
Rakesh

Reputation: 82765

Use json to store the count.

Ex:

import json

# Read Json
with open('data_store.json') as jfile:
    data = json.load(jfile)

for wrd in words:
   count = sum(1 for _ in re.finditer(r'\b%s\b' % re.escape(wrd), data))
   if wrd not in data:
       data[wrd] = 0
   data[wrd] += count   # Increment Count

# Write Result to JSON
with open('data_store.json', "w") as jfile:
    json.dump(data, jfile)

Upvotes: 1

Related Questions