Ana
Ana

Reputation: 141

How to pass a command line within a function?

I am trying to unzip fasta.gz files in order to work with them. I have created a script using cmd base on something I have done before but now I cannot manage to work the newly created function. See below:

import glob
import sys
import os
import argparse
import subprocess
import gzip
#import gunzip

def decompressed_files():
    print ('starting decompressed_files')
    #files where the data is stored
    input_folder=('/home/me/me_files/PB_assemblies_for_me')
    #where I want my data to be
    output_folder=input_folder + '/fasta_files'
    if os.path.exists(output_folder):
        print ('folder already exists')
    else:
        os.makedirs(output_folder)
        print ('folder has been created')

    for f in input_folder:
        fasta=glob.glob(input_folder + '/*.fasta.gz')
        #print (fasta[0])
        #sys.exit()
        cmd =['gunzip', '-k', fasta, output_folder]
        my_file=subprocess.Popen(cmd)
        my_file.wait

decompressed_files()
print ('The programme has finished doing its job')

But this give the following error:

TypeError: execv() arg 2 must contain only strings

If I write fasta, the programme looks for a file an the error becomes:

fasta.gz: No such file or directory

If I go to the directory where I have the files and I key gunzip, name_file_fasta_gz, it does the job beautifully but I have a few files in the folder and I would like to create the loop. I have used 'cmd' before as you can see in the code below and I didn't have any problem with it. Code from the past where I was able to put string, and non-string.

cmd=['velveth', output, '59', '-fastq.gz', '-shortPaired', fastqs[0], fastqs[1]]
#print cmd
my_file=subprocess.Popen(cmd)#I got this from the documentation.
my_file.wait()

I will be happy to learn other ways to insert linux commands within a python function. The code is for python 2.7, I know it is old but it is the one is install in the server at work.

Upvotes: 0

Views: 108

Answers (2)

Shakeel
Shakeel

Reputation: 2015

I haven't tested this but it might solve you unzip problem using command. command gunzip -k is to keep both the compressed and decompressed file then what is the purpose of output directory.

import subprocess
import gzip


def decompressed_files():
    print('starting decompressed_files')
    # files where the data is stored
    input_folder=('input')
    # where I want my data to be
    output_folder = input_folder + '/output'
    if os.path.exists(output_folder):
        print('folder already exists')
    else:
        os.makedirs(output_folder)
        print('folder has been created')

    for f in os.listdir(input_folder):
        if f and f.endswith('.gz'):
            cmd = ['gunzip', '-k', f, output_folder]
            my_file = subprocess.Popen(cmd)
            my_file.wait

print(cmd) will look as shown below

['gunzip', '-k', 'input/sample.gz', 'input/output']

I have a few files in the folder and I would like to create the loop

From above quote your actual problem seems to be unzip multiple *.gz files from path in that case below code should solve your problem.

import os
import shutil
import fnmatch

def gunzip(file_path,output_path):
    with gzip.open(file_path,"rb") as f_in, open(output_path,"wb") as f_out:
        shutil.copyfileobj(f_in, f_out)


def make_sure_path_exists(path):
    try:
        os.makedirs(path)
    except OSError:
        if not os.path.isdir(path):
            raise


def recurse_and_gunzip(input_path):
    walker = os.walk(input_path)
    output_path = 'files/output'
    make_sure_path_exists(output_path)
    for root, dirs, files in walker:
        for f in files:
            if fnmatch.fnmatch(f,"*.gz"):
                gunzip(root + '/' + f, output_path + '/' + f.replace(".gz",""))


recurse_and_gunzip('files')

source

EDIT:

Using command line arguments - subprocess.Popen(base_cmd + args) : Execute a child program in a new process. On Unix, the class uses os.execvp()-like behavior to execute the child program

fasta.gz: No such file or directory

So any extra element to cmd list is treated as argument and gunzip will look for argument.gz file hence the error fasta.gz file not found.

ref and some useful examples

Now if you want to pass gz files as command line argument you can still do that with below code( you might need to polish little bit as per your need)

import argparse
import subprocess
import os


def write_to_desired_location(stdout_data,output_path):
    print("Going to write to path", output_path)
    with open(output_path, "wb") as f_out:
        f_out.write(stdout_data)


def decompress_files(gz_files):
    base_path=('files')  # my base path
    output_path = base_path + '/output'  # output path
    if os.path.exists(output_path):
        print('folder already exists')
    else:
        os.makedirs(output_path)
        print('folder has been created')

    for f in gz_files:
        if f and f.endswith('.gz'):
            print('starting decompressed_files', f)
            proc = subprocess.Popen(['gunzip', '-dc', f], stdout=subprocess.PIPE)  # d:decompress and c:stdout
            write_to_desired_location(proc.stdout.read(), output_path + '/' + f.replace(".gz", ""))


if __name__ == "__main__":
    parser = argparse.ArgumentParser()
    parser.add_argument(
        "-gzfilelist",
        required=True,
        nargs="+",  # 1 or more arguments
        type=str,
        help='Provide gz files as arguments separated by space Ex: -gzfilelist test1.txt.tar.gz test2.txt.tar.gz'
    )

    args = parser.parse_args()
    my_list = [str(item)for item in args.gzfilelist]  # converting namedtuple into list
    decompress_files(gz_files=my_list)

execution:

python unzip_file.py -gzfilelist test.txt.tar.gz

output

folder already exists
('starting decompressed_files', 'test.txt.tar.gz')
('Going to write to path', 'files/output/test.txt.tar')

You can pass multiple gz files as well for example

python unzip_file.py -gzfilelist test1.txt.tar.gz test2.txt.tar.gz test3.txt.tar.gz

Upvotes: 0

Arnie97
Arnie97

Reputation: 1069

fasta is a list returned by glob.glob(). Hence cmd = ['gunzip', '-k', fasta, output_folder] generates a nested list:

['gunzip', '-k', ['foo.fasta.gz', 'bar.fasta.gz'], output_folder]

but execv() expects a flat list:

['gunzip', '-k', 'foo.fasta.gz', 'bar.fasta.gz', output_folder]

You can use the list concentration operator + to create a flat list:

cmd = ['gunzip', '-k'] + fasta + [output_folder]

Upvotes: 1

Related Questions