Reputation: 135
I have several hundred data files with following structure:
ATOM 1 CG TYR C 58 121.612 160.894 112.763 1.00 0.00 C
ATOM 2 CD1 TYR C 58 120.943 162.067 113.040 1.00 0.00 C
ATOM 3 CD2 TYR C 58 121.188 159.746 113.389 1.00 0.00 C
ATOM 4 CE1 TYR C 58 119.873 162.092 113.912 1.00 0.00 C
ATOM 5 CE2 TYR C 58 120.129 159.760 114.258 1.00 0.00 C
ATOM 6 CZ TYR C 58 119.475 160.934 114.519 1.00 0.00 C
ATOM 7 OH TYR C 58 118.415 160.939 115.392 1.00 0.00 O
ATOM 8 OD1 ASN C 60 119.864 156.037 117.108 1.00 0.00 O
ATOM 9 CG PHE C 77 122.548 156.511 110.481 1.00 0.00 C
ATOM 10 CD1 PHE C 77 122.075 155.486 109.711 1.00 0.00 C
ATOM 11 CD2 PHE C 77 122.223 156.541 111.807 1.00 0.00 C
ATOM 12 CE1 PHE C 77 121.216 154.566 110.224 1.00 0.00 C
ATOM 13 CE2 PHE C 77 121.377 155.605 112.335 1.00 0.00 C
ATOM 14 CZ PHE C 77 120.877 154.618 111.540 1.00 0.00 C
ATOM 15 NZ LYS D 156 112.602 154.253 117.823 1.00 0.00 N
ATOM 16 O ILE D 202 108.373 159.140 111.337 1.00 0.00 O
ATOM 17 N VAL D 203 109.786 157.858 110.154 1.00 0.00 N
ATOM 18 CA VAL D 203 110.994 158.530 110.614 1.00 0.00 C
ATOM 19 C VAL D 203 111.459 159.524 109.568 1.00 0.00 C
ATOM 20 CB VAL D 203 112.099 157.518 110.929 1.00 0.00 C
ATOM 21 CG1 VAL D 203 113.424 158.213 111.097 1.00 0.00 C
ATOM 22 CG2 VAL D 203 111.757 156.818 112.216 1.00 0.00 C
ATOM 23 N GLN D 204 111.583 160.788 109.970 1.00 0.00 N
ATOM 24 O GLN D 204 114.017 162.417 110.404 1.00 0.00 O
ATOM 25 CA SER D 205 115.779 162.096 108.277 1.00 0.00 C
ATOM 26 CB SER D 205 116.596 160.967 107.666 1.00 0.00 C
ATOM 27 OG SER D 205 117.961 161.337 107.661 1.00 0.00 O
ATOM 28 C UNL X 1 111.662 159.873 113.972 1.00 0.00 C
ATOM 29 N UNL X 1 113.085 160.155 114.126 1.00 0.00 N
ATOM 30 C UNL X 1 113.499 161.458 113.812 1.00 0.00 C
ATOM 31 O UNL X 1 112.732 162.299 113.334 1.00 0.00 O
ATOM 32 C UNL X 1 114.928 161.844 114.171 1.00 0.00 C
ATOM 33 N UNL X 1 115.842 161.124 113.296 1.00 0.00 N
ATOM 34 C UNL X 1 116.000 159.854 113.475 1.00 0.00 C
ATOM 35 C UNL X 1 115.326 159.120 114.591 1.00 0.00 C
ATOM 36 C UNL X 1 116.110 158.332 115.447 1.00 0.00 C
ATOM 37 C UNL X 1 115.508 157.476 116.361 1.00 0.00 C
ATOM 38 CL UNL X 1 116.480 156.444 117.332 1.00 0.00 CL
ATOM 39 C UNL X 1 114.125 157.429 116.470 1.00 0.00 C
ATOM 40 C UNL X 1 113.345 158.291 115.696 1.00 0.00 C
ATOM 41 C UNL X 1 113.925 159.189 114.776 1.00 0.00 C
ATOM 42 C UNL X 1 116.862 159.133 112.452 1.00 0.00 C
ATOM 43 C UNL X 1 116.961 157.743 112.314 1.00 0.00 C
ATOM 44 C UNL X 1 117.837 157.173 111.379 1.00 0.00 C
ATOM 45 C UNL X 1 118.592 157.982 110.536 1.00 0.00 C
ATOM 46 C UNL X 1 118.471 159.363 110.623 1.00 0.00 C
ATOM 47 C UNL X 1 117.619 159.931 111.575 1.00 0.00 C
ATOM 48 H UNL X 1 111.189 160.691 113.471 1.00 0.00 H
ATOM 49 H UNL X 1 111.218 159.741 114.937 1.00 0.00 H
ATOM 50 H UNL X 1 111.536 158.980 113.396 1.00 0.00 H
END
My goal is to take the lines that have UNL X
in them and reinsert them to the beginning of the file, and delete their initial position.
However, what I end up with is my code writing in the lines starting at ATOM 50
then counting down to ATOM 49
then ATOM 48
etc., etc. Essentially, my code prepends to the file backwards and doesn't delete the UNL X
lines at beginning of the file, making the final result look like:
ATOM 50 H UNL X 1 111.536 158.980 113.396 1.00 0.00 H
ATOM 49 H UNL X 1 111.218 159.741 114.937 1.00 0.00 H
ATOM 48 H UNL X 1 111.189 160.691 113.471 1.00 0.00 H
ATOM 47 C UNL X 1 117.619 159.931 111.575 1.00 0.00 C
ATOM 46 C UNL X 1 118.471 159.363 110.623 1.00 0.00 C
ATOM 45 C UNL X 1 118.592 157.982 110.536 1.00 0.00 C
ATOM 44 C UNL X 1 117.837 157.173 111.379 1.00 0.00 C
ATOM 43 C UNL X 1 116.961 157.743 112.314 1.00 0.00 C
ATOM 42 C UNL X 1 116.862 159.133 112.452 1.00 0.00 C
ATOM 41 C UNL X 1 113.925 159.189 114.776 1.00 0.00 C
ATOM 40 C UNL X 1 113.345 158.291 115.696 1.00 0.00 C
ATOM 39 C UNL X 1 114.125 157.429 116.470 1.00 0.00 C
ATOM 38 CL UNL X 1 116.480 156.444 117.332 1.00 0.00 CL
ATOM 37 C UNL X 1 115.508 157.476 116.361 1.00 0.00 C
ATOM 36 C UNL X 1 116.110 158.332 115.447 1.00 0.00 C
ATOM 35 C UNL X 1 115.326 159.120 114.591 1.00 0.00 C
ATOM 34 C UNL X 1 116.000 159.854 113.475 1.00 0.00 C
ATOM 33 N UNL X 1 115.842 161.124 113.296 1.00 0.00 N
ATOM 32 C UNL X 1 114.928 161.844 114.171 1.00 0.00 C
ATOM 31 O UNL X 1 112.732 162.299 113.334 1.00 0.00 O
ATOM 30 C UNL X 1 113.499 161.458 113.812 1.00 0.00 C
ATOM 29 N UNL X 1 113.085 160.155 114.126 1.00 0.00 N
ATOM 28 C UNL X 1 111.662 159.873 113.972 1.00 0.00 C
ATOM 1 CG TYR C 58 121.612 160.894 112.763 1.00 0.00 C
ATOM 2 CD1 TYR C 58 120.943 162.067 113.040 1.00 0.00 C
ATOM 3 CD2 TYR C 58 121.188 159.746 113.389 1.00 0.00 C
ATOM 4 CE1 TYR C 58 119.873 162.092 113.912 1.00 0.00 C
ATOM 5 CE2 TYR C 58 120.129 159.760 114.258 1.00 0.00 C
ATOM 6 CZ TYR C 58 119.475 160.934 114.519 1.00 0.00 C
ATOM 7 OH TYR C 58 118.415 160.939 115.392 1.00 0.00 O
ATOM 8 OD1 ASN C 60 119.864 156.037 117.108 1.00 0.00 O
ATOM 9 CG PHE C 77 122.548 156.511 110.481 1.00 0.00 C
ATOM 10 CD1 PHE C 77 122.075 155.486 109.711 1.00 0.00 C
ATOM 11 CD2 PHE C 77 122.223 156.541 111.807 1.00 0.00 C
ATOM 12 CE1 PHE C 77 121.216 154.566 110.224 1.00 0.00 C
ATOM 13 CE2 PHE C 77 121.377 155.605 112.335 1.00 0.00 C
ATOM 14 CZ PHE C 77 120.877 154.618 111.540 1.00 0.00 C
ATOM 15 NZ LYS D 156 112.602 154.253 117.823 1.00 0.00 N
ATOM 16 O ILE D 202 108.373 159.140 111.337 1.00 0.00 O
ATOM 17 N VAL D 203 109.786 157.858 110.154 1.00 0.00 N
ATOM 18 CA VAL D 203 110.994 158.530 110.614 1.00 0.00 C
ATOM 19 C VAL D 203 111.459 159.524 109.568 1.00 0.00 C
ATOM 20 CB VAL D 203 112.099 157.518 110.929 1.00 0.00 C
ATOM 21 CG1 VAL D 203 113.424 158.213 111.097 1.00 0.00 C
ATOM 22 CG2 VAL D 203 111.757 156.818 112.216 1.00 0.00 C
ATOM 23 N GLN D 204 111.583 160.788 109.970 1.00 0.00 N
ATOM 24 O GLN D 204 114.017 162.417 110.404 1.00 0.00 O
ATOM 25 CA SER D 205 115.779 162.096 108.277 1.00 0.00 C
ATOM 26 CB SER D 205 116.596 160.967 107.666 1.00 0.00 C
ATOM 27 OG SER D 205 117.961 161.337 107.661 1.00 0.00 O
ATOM 28 C UNL X 1 111.662 159.873 113.972 1.00 0.00 C
ATOM 29 N UNL X 1 113.085 160.155 114.126 1.00 0.00 N
ATOM 30 C UNL X 1 113.499 161.458 113.812 1.00 0.00 C
ATOM 31 O UNL X 1 112.732 162.299 113.334 1.00 0.00 O
ATOM 32 C UNL X 1 114.928 161.844 114.171 1.00 0.00 C
ATOM 33 N UNL X 1 115.842 161.124 113.296 1.00 0.00 N
ATOM 34 C UNL X 1 116.000 159.854 113.475 1.00 0.00 C
ATOM 35 C UNL X 1 115.326 159.120 114.591 1.00 0.00 C
ATOM 36 C UNL X 1 116.110 158.332 115.447 1.00 0.00 C
ATOM 37 C UNL X 1 115.508 157.476 116.361 1.00 0.00 C
ATOM 38 CL UNL X 1 116.480 156.444 117.332 1.00 0.00 CL
ATOM 39 C UNL X 1 114.125 157.429 116.470 1.00 0.00 C
ATOM 40 C UNL X 1 113.345 158.291 115.696 1.00 0.00 C
ATOM 41 C UNL X 1 113.925 159.189 114.776 1.00 0.00 C
ATOM 42 C UNL X 1 116.862 159.133 112.452 1.00 0.00 C
ATOM 43 C UNL X 1 116.961 157.743 112.314 1.00 0.00 C
ATOM 44 C UNL X 1 117.837 157.173 111.379 1.00 0.00 C
ATOM 45 C UNL X 1 118.592 157.982 110.536 1.00 0.00 C
ATOM 46 C UNL X 1 118.471 159.363 110.623 1.00 0.00 C
ATOM 47 C UNL X 1 117.619 159.931 111.575 1.00 0.00 C
ATOM 48 H UNL X 1 111.189 160.691 113.471 1.00 0.00 H
ATOM 49 H UNL X 1 111.218 159.741 114.937 1.00 0.00 H
ATOM 50 H UNL X 1 111.536 158.980 113.396 1.00 0.00 H
END
Heres what I've tried so far:
import os
def prepend_line(file_name, line):
with open(file_name, "r+") as f: s = f.read(); f.seek(0); f.write(line + s)
pathway = r'C:\Users\Family\Desktop\GABA Project\GABA Structures\New Ligands With Hydrogens\Similar To Valium\Mcule 6HUP Entire ECD Diazepam RENUMBERING TEST' # first define the subdirectory
pathway_tree = os.walk(pathway)
os.chdir(pathway)
for subdir, dirs, files_in_dirs in pathway_tree:
#print(f"dirs! {dirs}")
pass
for file_names in files_in_dirs:
try:
if "Partial Pocket" in file_names and ".pdb" in file_names:
os.chdir(subdir) # changes to the specific sub directory using the great filter
with open(file_names, "r") as input:
for input_file_line in input:
# captures each line in the file as an item in an array
array_of_words_in_line = input_file_line.split() # further splits the line in the file as its own array with each item # being a string
three_letter_code = array_of_words_in_line[3] # ie, UNK
if three_letter_code == "UNL" or three_letter_code == "UNK" or three_letter_code == "LIG":
prepend_line(file_names, input_file_line)
except IndexError:
pass
except NameError:
pass
Upvotes: 0
Views: 46
Reputation: 3550
This type of text-based data processing is basically what Unix command line tools were made for. Here is a oneliner to do it with sed
:
$ (sed -n /UNL/p data.txt; sed /UNL/d data.txt) > processed_data.txt
Explanation: the first sed
invocation prints all lines containing UNL
. The second one deletes all lines containing UNL
. The combined output is redirected to processed_data.txt
.
Upvotes: 0
Reputation: 4033
You could just create a completely new list of lines, and then write those lines to an output file:
i = 0
new_text = []
with open(filename, "r") as fi:
for line in fi:
if line.split()[3] in {"UNL", "UNK", "LIG"}:
new_text.insert(i, line) # reinsert at start of list
i += 1
else:
new_text.append(line) # append to the end of the list
new_text = "".join(new_text)
with open(filename, "w") as fo:
fo.write(new_text)
Upvotes: 1