em456
em456

Reputation: 423

Counting multiple strings using regex in multiple files

I'm trying to count how many times a "type " appears in a text file and need to include the following word. For example how many times "type A" or "type apples" shows in multiple files. I've got this far but instead of counting it shows one for each. I thought it'd be best to store in a dictionary so I could have type then count.

current output

file 1.txt {type A : 1}
file 1.txt {type A : 1}
file 2.txt {type apples : 1}
file 2.txt {type apples : 1}

However this is what I would like. I'm a beginner at python so i feel like im missing something obvious.

expected output

file 1.txt {type A : 2}
file 2.txt {type apples : 2}

this is the code i have so far

def find_files(d):
   for root, dirs, files in os.walk(d):
       for filename in files:
           if filename.endswith('.txt'):
               yield os.path.join(root, filename)

for file_name in find_files(d):
    with open(file_name, 'r') as f: 
        for line in f:
             results = defaultdict(int)
             line = line.lower().strip()
             match = re.search('type (\S+)', line)
             if match:
                results[match.group(0)] += 1
                print(file_name, results)

Upvotes: 1

Views: 138

Answers (1)

jignatius
jignatius

Reputation: 6484

A few errors:

  • you're creating a new dictionary for each line; better to create one for each file
  • re.search will find the first the first match in a string; you could use re.findall to find all matches

Here's an amended version of your code:

for file_name in find_files(d):
    with open(file_name, 'r') as f:
        results = defaultdict(int)
        for line in f:
             line = line.lower().strip()
             matches = re.findall('type (\S+)', line)
             if matches:
                for word in matches:
                    results[word] += 1
        print(file_name, results)

Upvotes: 1

Related Questions