user1739581
user1739581

Reputation: 85

Python: read multiple source txt files, copy by criteria into 1 output file

My objective is to read multiple txt source files in a folder (small size), then copy lines selected by criteria into one output txt file. I can do this with 1 source file, but I have no output (empty) when I try to read multiple files and do the same.

With my SO research I wrote following code (no output):

import glob
# import re  --- taken out as 'overkill'

path = 'C:/Doc/version 1/Input*.txt'   # read source files in this folder with this name format
list_of_files=glob.glob(path)   

criteria = ['AB', 'CD', 'EF']   # select lines that start with criteria

#list_of_files = glob.glob('./Input*.txt')

with open("P_out.txt", "a") as f_out:
    for fileName in list_of_files:
        data_list = open( fileName, "r" ).readlines()
    for line in data_list:
        for letter in criteria:
            if line.startswith(letter): 
                f_out.write('{}\n'.format(line))

Thank you for your help.

@abe and @ppperry: I'd like to particularly thank you for your earlier input.

Upvotes: 1

Views: 1042

Answers (2)

abe
abe

Reputation: 504

The errors:

  1. Line #14 should look for lines in data_list, not fileName.
  2. "I can do this with 1 source file, but I have no output (empty) when I try to read multiple files and do the same." Lines 14 through 17 should be indented or else the for loop that iterates over the list_of_files will only loop over the first file.
  3. You did not even use lines 4 and 5, so why include them? They have no effect.

Here is your code fixed, with comments:

import glob
import re

#path = 'C:\Doc\version 1\Output*.txt'   # read all source files with this name format
#files=glob.glob(path)

criteria = ['AB', 'CD', 'EF']   # select lines that start with criteria

list_of_files = glob.glob('./Output*.txt')

with open("P_out.txt", "a") as f_out: #use "a" so you can keep the data from the last Output.txt
    for fileName in list_of_files:
        data_list = open( fileName, "r" ).readlines()
        #indenting the below will allow you to search through all files.
        for line in data_list: #Search data_list, not fileName
            for letter in criteria:
                if re.search(letter,line):
                    f_out.writelines('{}\n'.format(line))
                    #I recommend the \n so that the text does not get concatenated when moving from file to file. 

#Really? I promise with will not lie to you. 
#f_out.close()  # 'with' construction should close files, yet I make sure they close

For those who downvoted, why not include a comment to justify your judgment? Everything the OP requested has been satisfied. If you think you can further improve the answer, suggest an edit. Thank you.

Upvotes: -1

pppery
pppery

Reputation: 3814

Problems with your code:

  1. You have two duplicate variables files and list_of_files but only use the latter.
  2. Every time you open a file, you override the variable data_list, which erases the contents of the previous file read.
  3. When you search the file for matching lines, you use the variable fileName instead of data_list!

Places that could use simplification:

  1. Using the re module is overkill for just finding out whether a string starts with another string. You can use line.startswith(letter).

Upvotes: 2

Related Questions