Strelzik
Strelzik

Reputation: 1

Appending Multiple Text Files using Dictionaries Python

I am currently working on some Data Analytics work and I'm having a bit of trouble with the Data Preprocessing.

I have compiled a folder of text files, with the name of the text file being the date that the text file corresponds to. I was originally able to append all of the text files to one document, but I wanted to use a dictionary in order to have 2 attributes, the filename (also the date) and the content in the text file.

This is the code:

import json
import os
import math

# Define output filename
OutputFilename = 'finalv2.txt'

# Define path to input and output files
InputPath  = 'C:/Users/Mike/Desktop/MonthlyOil/TextFiles'
OutputPath = 'C:/Users/Mike/Desktop/MonthlyOil/'

# Convert forward/backward slashes
InputPath  = os.path.normpath(InputPath)
OutputPath = os.path.normpath(OutputPath)

# Define output file and open for writing
filename = os.path.join(OutputPath,OutputFilename)
file_out = open(filename, 'w')
print ("Output file opened")

size = math.inf

def append_record(record):
    with open('finalv2.txt', 'a') as f:
        json.dump(record, f)
        f.write(json.dumps(record))

# Loop through each file in input directory
    for file in os.listdir(InputPath):
    # Define full filename
    filename = os.path.join(InputPath,file)
    if os.path.isfile(filename):
        print ("  Adding :" + file)
        file_in = open(filename, 'r')
        content = file_in.read()
        dict = {'filename':filename,'content':content}
        print ("dict['filename']: ", dict['filename'] )     
        append_record(dict)    
        file_in.close()


# Close output file
file_out.close()
print ("Output file closed")

The problem I am experiencing is that it won't append my file, I havea line in there which tests whether or not the dict contains anything and it does, I have tested both content and filename.

Any ideas what I'm missing to get the dict appended to the file?

Upvotes: 0

Views: 1323

Answers (1)

jedwards
jedwards

Reputation: 30230

There are many issues, but the one that is causing the trouble here is that you're opening finalv2.txt twice. Once with mode w (and doing nothing with it), and again inside append_record(), this time with mode a.

Consider the following:

import json
import os
import math

# Define output filename
OutputFilename = 'finalv2.txt'

# Define path to input and output files
InputPath  = 'C:/Users/Mike/Desktop/MonthlyOil/TextFiles'
OutputPath = 'C:/Users/Mike/Desktop/MonthlyOil/'

# Convert forward/backward slashes
InputPath  = os.path.normpath(InputPath)
OutputPath = os.path.normpath(OutputPath)

# Define output file
out_file = os.path.join(OutputPath,OutputFilename)

size = None

def append_record(fn, record):
    with open(fn, 'a') as f:
        json.dump(record, f)
        #f.write(json.dumps(record))

# Loop through each file in input directory
for fn in os.listdir(InputPath):
    # Define full filename
    in_file = os.path.join(InputPath,fn)
    if os.path.isfile(in_file):
        print("  Adding: " + fn)
        with open(in_file, 'r') as file_in:
            content = file_in.read()
            d = {'filename':in_file, 'content':content}
            print("d['filename']: ", d['filename'] )
            append_record(out_file, d)

Which works as you expected.

Here:

  • Files aren't explicitly opened and closed, they're managed by context managers (with)
  • There are no longer variables named dict and file
  • You define finalv2.txt in one place, and one place only
  • filename is not defined twice, once as the output file and then again as the input file. Instead there are out_file and in_file
  • You pass the output filename to your append_record function
  • You don't (attempt to) append the json twice -- only once (you can pick which method you prefer, they both work)

Upvotes: 3

Related Questions