Arie G.
Arie G.

Reputation: 87

How to run a script on all files in a directory?

I have a script that does some basic text cleaning and tokenizing and then counting and sorting word frequency. I'm able to get the script to work on individual files but I need help implementing it on an entire directory. So in short, I'd like to use this code to count the global word frequency across the entire directory (not return individual values for each file).

Here's my code:

import re
import string
from collections import Counter

file = open("german/test/polarity/positive/0.txt", mode="r", encoding="utf-8")
read_file = file.read()

#remove punctuation
translation = str.maketrans("","", string.punctuation)
stripped_file = read_file.translate(translation)

##lowercase
file_clean = stripped_file.lower()

##tokenize
file_tokens = file_clean.split()

##word count and sort
def word_count(file_tokens):
    for word in file_tokens:
        count = Counter(file_tokens)
    return count

print(word_count(file_tokens))

Upvotes: 0

Views: 77

Answers (2)

adlopez15
adlopez15

Reputation: 4367

For Python => 3.6 use os


directory = os.fsencode(directory_in_str)

for file in os.listdir(directory):
     filename = os.fsdecode(file)
     if filename.endswith(".txt"): 
         # print(os.path.join(directory, filename))
         continue
     else:
         continue

Please see here

Upvotes: 0

Personman
Personman

Reputation: 2323

You're probably looking for os.walk().

Move your code into a function, and then use

for subdir, dirs, files in os.walk(rootdir):
    for file in files:

to call the function on each file

Upvotes: 1

Related Questions