Reputation: 1
I am currently working on some Data Analytics work and I'm having a bit of trouble with the Data Preprocessing.
I have compiled a folder of text files, with the name of the text file being the date that the text file corresponds to. I was originally able to append all of the text files to one document, but I wanted to use a dictionary in order to have 2 attributes, the filename (also the date) and the content in the text file.
This is the code:
import json
import os
import math
# Define output filename
OutputFilename = 'finalv2.txt'
# Define path to input and output files
InputPath = 'C:/Users/Mike/Desktop/MonthlyOil/TextFiles'
OutputPath = 'C:/Users/Mike/Desktop/MonthlyOil/'
# Convert forward/backward slashes
InputPath = os.path.normpath(InputPath)
OutputPath = os.path.normpath(OutputPath)
# Define output file and open for writing
filename = os.path.join(OutputPath,OutputFilename)
file_out = open(filename, 'w')
print ("Output file opened")
size = math.inf
def append_record(record):
with open('finalv2.txt', 'a') as f:
json.dump(record, f)
f.write(json.dumps(record))
# Loop through each file in input directory
for file in os.listdir(InputPath):
# Define full filename
filename = os.path.join(InputPath,file)
if os.path.isfile(filename):
print (" Adding :" + file)
file_in = open(filename, 'r')
content = file_in.read()
dict = {'filename':filename,'content':content}
print ("dict['filename']: ", dict['filename'] )
append_record(dict)
file_in.close()
# Close output file
file_out.close()
print ("Output file closed")
The problem I am experiencing is that it won't append my file, I havea line in there which tests whether or not the dict contains anything and it does, I have tested both content and filename.
Any ideas what I'm missing to get the dict appended to the file?
Upvotes: 0
Views: 1323
Reputation: 30230
There are many issues, but the one that is causing the trouble here is that you're opening finalv2.txt
twice. Once with mode w
(and doing nothing with it), and again inside append_record()
, this time with mode a
.
Consider the following:
import json
import os
import math
# Define output filename
OutputFilename = 'finalv2.txt'
# Define path to input and output files
InputPath = 'C:/Users/Mike/Desktop/MonthlyOil/TextFiles'
OutputPath = 'C:/Users/Mike/Desktop/MonthlyOil/'
# Convert forward/backward slashes
InputPath = os.path.normpath(InputPath)
OutputPath = os.path.normpath(OutputPath)
# Define output file
out_file = os.path.join(OutputPath,OutputFilename)
size = None
def append_record(fn, record):
with open(fn, 'a') as f:
json.dump(record, f)
#f.write(json.dumps(record))
# Loop through each file in input directory
for fn in os.listdir(InputPath):
# Define full filename
in_file = os.path.join(InputPath,fn)
if os.path.isfile(in_file):
print(" Adding: " + fn)
with open(in_file, 'r') as file_in:
content = file_in.read()
d = {'filename':in_file, 'content':content}
print("d['filename']: ", d['filename'] )
append_record(out_file, d)
Which works as you expected.
Here:
with
)dict
and file
finalv2.txt
in one place, and one place onlyfilename
is not defined twice, once as the output file and then again as the input file. Instead there are out_file
and in_file
append_record
functionUpvotes: 3