Python: nesting loops instead of creating multiple inputs and outputs

Question

I just started learning python and programming so this is probably a pretty naive question. But I'll appreciate any help.

The following code works, but I've been told that having these multiple inputs and output is bad and that I should instead nest the loops. But try as I might every time I try to nest anything it just ends up giving me an empty folder.

So my question is how to I nest all these?

Thanks and Sorry for the long post.

 #1) I call a perl script and execute it to get the input file.
perl = "/usr/bin/perl"
perl_script = "geoFF.pl";
params = " --mount-doom-hot"
pl_script = subprocess.Popen([perl, perl_script, params], stdout=sys.stdout)
pl_script.communicate()

## 2) input the output from the perl script but only the wanted data.
# The input is a BIG file and I just want some specific lines from it.
infile1 = "inputperl.txt"  
outfile1 = "c1.txt"   

f1 = open(infile1,'rU')
o1 = open(outfile1,'w+')

words = ['Acc','title','orgn','date','GP'] #for lines in file f1 get lines with the words

for line in f1:
    if any(words in line for words in words):
        o1.write(line)

# From the specific lines delete some symbols/charactewords I don't want.   

input1 =open("c1.txt",'rU')   
output1 = open("c2.txt",'w')
del_list = ['>','title', 'orgn','date','<','GP','/Item','"','','','Name=','DocS','Acc'] # I want to keep the rest of the line but not these words.

for line in input1:
    for word in del_list:
         line = line.replace(word, "")
    output1.write(line)

# For one specific word in the lines AB. The file has lines with AB129, AB8877, AB0997 and AB(etc). Here I want to attach and url so it will be an hyperlink.Attached url to GSE to get hyperlink
inp = open("c2.txt",'rU')
out= open("c3.txt",'w')
filedata2 = inp.read()
newdata2 = filedata2.replace('AB', "
"'http://www.whatever.com/g/qu/acc.cgi?acc=AB')
out.write(newdata2)
# this output the line as http://www.whatever.com/g/qu/acc.cgi?acc=AB(somenumber)
#for example http://www.whatever.com/g/qu/acc.cgi?acc=AB129
#and http://www.whatever.com/g/qu/acc.cgi?acc=AB8877 etc.

### then I want to take this files with the changes and send it by email
from email.MIMEMultipart import MIMEMultipart
from email.MIMEText import MIMEText

fromaddr = "sender@gmail.com"
toaddr = "receiver@gmail.com"
msg = MIMEMultipart()
msg['From'] = fromaddr
msg['To'] = toaddr
msg['Subject'] = "RESULT"

# send txt file in email body
f6 = (open("c3.txt",'rU'))
results = MIMEText(f6.read(),'plain') 
f6.close()
msg.attach(results)

#convert to string
import smtplib
server = smtplib.SMTP('smtp.gmail.com', 587)
server.ehlo()
server.starttls()
server.ehlo()
server.login("sender email", "password")
text = msg.as_string()
server.sendmail(fromaddr, toaddr, text)

the input file looks like





    20006767
    AB64767
    
    word word title of this word...
    word word word..word word word..
    11002;13112
    64767
    Mus musculus
    AB
    word word word..word word word..word word word..
    
    
    
    
    2015/12/09
    WIG
       
    
    12
    
    
    
    
    

    200098567
    AB64789
    
    word word word...
    word word word..word word word..
    11002;13112
    AB64789
    Mus musculus
    AB
    word word word..word word word..word word word..
    
    
    
    
    2015/12/09
    WIG
    

         
    200064997
    AB69957
    
    word word word...
    word word word..word word word..
    1100
    69957
    Mus musculus
    AB
    word word word..word word word..word word word..
    
    
    
    
    2015/12/09
    WIG
       
    
    12
    
    
    
    
    
    26476451
    
    
    no

I just want the following data:

AB64767
word word title of this word...
64767
Mus musculus
2015/12/09

But showing as:

http://www.whatever.com/g/qu/acc.cgi?acc=AB64767
word word title of this word...
Mus musculus
2015/12/09

http://www.whatever.com/g/qu/acc.cgi?acc=AB64789
word word title of this word...
Mus musculus
2015/12/09

http://www.whatever.com/g/qu/acc.cgi?acc=AB69957
word word title of this word...
Mus musculus
2015/12/09

Padraic Cunningham · Accepted Answer

Reading the file once and using a regex would be a better approach:

import re
del_list = ['>', 'title', 'orgn', 'date', '<', 'GP', '/Item', '"', '', '', 'Name=', 'DocS',
            'Acc']  # I want to keep the rest of the line but not these words.
words = ['Acc', 'title', 'orgn', 'date', 'GP'] 


rep = re.compile(r'|'.join(del_list))
keep = re.compile(r"|".join(words))
r3 = re.compile("AB(?=\d)")

with open("test.txt") as f, open("out.txt","w") as out:
    for line in f:
         # if line contains match from words
        if keep.search(line):
            # replace all unwanted substrings
            line = rep.sub("", line.lstrip())
            line = r3.sub("
"'http://www.whatever.com/g/qu/acc.cgi?acc=AB', line)
            out.write(line)

out.txt:

Item  Type=String
http://www.whatever.com/g/qu/acc.cgi?acc=AB64767
Item  Type=Stringword word  of this word...
Item  Type=String11002;13112
Item  Type=StringMus musculus
Item  Type=String2015/12/09
Item  Type=String
http://www.whatever.com/g/qu/acc.cgi?acc=AB64789
Item  Type=Stringword word word...
Item  Type=String11002;13112
Item  Type=StringMus musculus
Item  Type=String2015/12/09
Item  Type=String
http://www.whatever.com/g/qu/acc.cgi?acc=AB69957
Item  Type=Stringword word word...
Item  Type=String1100
Item  Type=StringMus musculus
Item  Type=String2015/12/09

If you are looking to match some words exactly then you will need to use word boundaries in the regexes or you will end up matching "foo" in "foobar", if all you want to do is send the file you don't have to write it to disk either.

Python: nesting loops instead of creating multiple inputs and outputs

Answers (2)

Related Questions