CodeDependency
CodeDependency

Reputation: 135

Extracting specific lines from a directory of similar files and writing them to an output file

I have several data files in a directory which look like this:

HETATM 8567  H   UNL     1     162.011 131.168 160.668  1.00  0.00           H  
HETATM 8568  H   UNL     1     160.551 132.538 159.626  1.00  0.00           H  
HETATM 8569  H   UNL     1     162.068 133.200 159.023  1.00  0.00           H  
HETATM 8570  H   UNL     1     164.188 130.964 160.292  1.00  0.00           H  
HETATM 8571  H   UNL     1     163.936 132.256 159.127  1.00  0.00           H  
ATOM   8551  O   ASN E 217     145.630 117.961 137.024  1.00  0.00           O  
ATOM   8552  CB  ASN E 217     144.218 115.615 138.674  1.00  0.00           C  
ATOM   8553  CG  ASN E 217     144.121 114.564 139.754  1.00  0.00           C  
ATOM   8554  OD1 ASN E 217     144.830 113.560 139.724  1.00  0.00           O  
ATOM   8555  ND2 ASN E 217     143.233 114.784 140.715  1.00  0.00           N  
ATOM   8551  O   ASN E 217     145.630 117.961 137.024  1.00  0.00           O  
ATOM   8552  CB  ASN E 217     144.218 115.615 138.674  1.00  0.00           C  
ATOM   8553  CG  ASN E 217     144.121 114.564 139.754  1.00  0.00           C  
ATOM   8554  OD1 ASN E 217     144.830 113.560 139.724  1.00  0.00           O  
ATOM   8555  ND2 ASN E 217     143.233 114.784 140.715  1.00  0.00           N                                                    

I only want the lines containing UNL and take these lines from all the files in the directory and write them into a single text file. However, the code I have so far is not writing to a text file. I can output the lines from a single file but not all of them.

list_of_file_names = os.listdir(pathway) # pathway being the 
empty_array = []

name_of_output_file = "All the ligands from " +  pathway.split("\\")[-1] + ".txt" #pathway.split("\\")[-1] would put the name of subfolder into the name of the output folder.

for file_name in list_of_file_names:

    if "Partial Pocket.pdb" in file_name:

        working_file_path = pathway + "\\" + file_name

        input_file_from_working_file_path = open(working_file_path,"r+")

        master_output_file = open(name_of_output_file, "w")

        #lines_from_file_array = input_file_from_working_file_path.readline()
        lines_from_file_array = input_file_from_working_file_path.readlines()
        for lines_i_actually_want in lines_from_file_array:
            if "   UNL     1     " in lines_i_actually_want or  "  LIG X   1     " in lines_i_actually_want or "CONNECT" in lines_i_actually_want or "LIG X   0" in lines_i_actually_want or "UNL" in lines_i_actually_want or "UNL X" in lines_i_actually_want or "  UNK X   0     " in lines_i_actually_want or "CRYST1    1.000    1.000    1.000  90.00  90.00  90.00 P 1           1" in lines_i_actually_want or "ALA" == lines_i_actually_want[3] or "ARG" == lines_i_actually_want[3] or "ASN" == lines_i_actually_want[3] or "ASP" == lines_i_actually_want[3] or "CYS" == lines_i_actually_want[3] or "GLN" == lines_i_actually_want[3] or "GLU" == lines_i_actually_want[3] or "GLY" == lines_i_actually_want[3] or "HIS" == lines_i_actually_want[3] or "ILE" == lines_i_actually_want[3] or "LEU" == lines_i_actually_want[3] or "LYS" == lines_i_actually_want[3] or "MET" == lines_i_actually_want[3] or "PHE" == lines_i_actually_want[3] or "PRO" == lines_i_actually_want[3] or "SER" == lines_i_actually_want[3] or "THR" == lines_i_actually_want[3] or "TRP" == lines_i_actually_want[3] or "TYR" == lines_i_actually_want[3] or "VAL" == lines_i_actually_want[3]:
                print(lines_i_actually_want.rstrip('\n'))
            else:
                master_output_file.write(lines_i_actually_want)
                #print(lines_i_actually_want.rstrip('\n'))
        master_output_file.close()

The long if statement was working partially as a way to only print out lines with UNL, LIG, UNK.

Help would be greatly appreciated, Thanks!

Upvotes: 0

Views: 36

Answers (1)

CryptoFool
CryptoFool

Reputation: 23089

You just want to open the output file for appending, like this:

master_output_file = open(name_of_output_file, "a")

If you open it with "w", it clears the file and starts writing from scratch. With "a", it appends onto the existing contents of the file.

Upvotes: 1

Related Questions