Reputation: 111
Using the following bit of code:
for root, dirs, files in os.walk(corpus_name):
for file in files:
if file.endswith(".v4_gold_conll"):
f= open(file)
lines = f.readlines()
tokens = [line.split()[3] for line in lines if line.strip()
and not line.startswith("#")]
print(tokens)
I get the following error:
Traceback (most recent call last): File "text_statistics.py", line 28, in corpus_reading_pos(corpus_name, option) File "text_statistics.py", line 13, in corpus_reading_pos f= open(file) FileNotFoundError: [Errno 2] No such file or directory: 'abc_0001.v4_gold_conll'
As you can see, the file was, in fact, located, but then when I try to open the file, it... can't find it?
Edit: using this updated code, it stops after reading 7 files, but there are 172 files.
def corpus_reading_token_count(corpus_name, option="token"):
for root, dirs, files in os.walk(corpus_name):
tokens = []
file_count = 0
for file in files:
if file.endswith(".v4_gold_conll"):
with open((os.path.join(root, file))) as f:
tokens += [line.split()[3] for line in f if line.strip() and not line.startswith("#")]
file_count += 1
print(tokens)
print("File count:", file_count)
Upvotes: 1
Views: 1502
Reputation: 82899
file
is just the file without the directory, which is root
in your code. Try this:
f = open(os.path.join(root, file)))
Also, you should better use with
to open the file, and not use file
as a variable name, shadowing the builtin type. Also, judging from your comment, you should probably extend the list of tokens (use +=
instead of =
):
tokens = []
for root, dirs, files in os.walk(corpus_name):
for filename in files:
if filename.endswith(".v4_gold_conll"):
with open(os.path.join(root, filename))) as f:
tokens += [line.split()[3] for line in f if line.strip() and not line.startswith("#")]
print(tokens)
Upvotes: 2
Reputation: 168967
You'll have to join the root
with the filename.
for root, dirs, files in os.walk(corpus_name):
for file in files:
if file.endswith(".v4_gold_conll"):
with open(os.path.join(root, file)) as f:
tokens = [
line.split()[3]
for line in f
if line.strip() and not line.startswith("#")
]
print(tokens)
Upvotes: 0