Purrsia
Purrsia

Reputation: 896

Using a perl script to loop through files in directory

I have a directory filled with several thousand .txt files. I need to run the same perl script on each .txt file and when it's done running the script on each .txt file, to name that file a unique name. Please forgive the basic inquiry as I am learning perl with this script for the first time.

I have seen other posts addressing this issue: How can I loop through files in a directory in Perl? and running loops through terminal: Take all files in dir and for each file do the same perl procedure.

A bit about my data: These are blastx subject sequence ID results

$head file1.txt
GCN2_SCHPO
GCN2_YEAST
GCN20_YEAST
GCNK_GLUOX

$head file2.txt
PDXA_RUEST
PDXA_SULSY
PDXA_SYNFM
PDXA_SYNY3

My perl script- uses Uniprot's Retrieve/ID mapping service programmatically, instead of putting in thousands of requests, manually (Retrieve/ID mapping):

use warnings;
use LWP::UserAgent;

@files = <*.txt>; # File containg list of UniProt IDs.

my $base = 'http://www.uniprot.org';
my $tool = 'uploadlists';

my $contact = ''; # Please set your email address here to help us debug in     
case of problems.
my $agent = LWP::UserAgent->new(agent => "libwww-perl $contact");
push @{$agent->requests_redirectable}, 'POST';

foreach $file (@files) {
my $response = $agent->post("$base/$tool/",
  [ 'file' => [@files],
    'format' => 'tab',
    'from' => 'ACC+ID',
    'to' => 'ACC',
    'columns' => 'id,database(ko)',
  ],
  'Content_Type' => 'form-data');

while (my $wait = $response->header('Retry-After')) {
  print STDERR "Waiting ($wait)...\n";
    sleep $wait;
      $response = $agent->get($response->base);

}

$response->is_success ?
print $response->content :
die 'Failed, got ' . $response->status_line .
    ' for ' . $response->request->uri . "\n";
print $file . "\n";
}

This script, instead of looping through each .txt file only grabs the first .txt file in my directory and performs this function over and over again on that one file only. However, at the end, it prints the correct file name. Here is an example output:

Entry   Cross-reference (ko) yourlist:M20170501A isomap:M201705
Q9HGN1  K16196; GCN2_SCHPO  
P15442  K16196; GCN2_YEAST  
P43535  K06158; GCN20_YEAST 
Q5FQ97  K00851; GCNK_GLUOX
file1.txt

Entry   Cross-reference (ko) yourlist:M20170501A isomap:M201705
Q9HGN1  K16196; GCN2_SCHPO  
P15442  K16196; GCN2_YEAST  
P43535  K06158; GCN20_YEAST 
Q5FQ97  K00851; GCNK_GLUOX
file2.txt 

I have tried to do this via terminal with the following loop as well:

for i in *; do perl script.pl $i $i.txt; done

and I get the same results.

I am missing something very simple and am asking for your wisdom on understanding why this loop is being loopy. Secondly, is there a way to code this (in the script or via terminal) to name each result of each .txt file differently?

Thank-you!

Upvotes: 1

Views: 2544

Answers (1)

Borodin
Borodin

Reputation: 126762

Your for loop foreach $file (@files) { ... }executes the following block repeatedly, setting $file to each file name in turn. But inside the loop you try to pass all of the files at once, using the parameter 'file' => [@files]

LWP treats that list as a file path, a file name, and a number of header names and values, so the data uploaded always comes from the first file in @files

The quick solution is to to replace that line with file => [ $file ] and then it should work, but there are a few other issues with you code so I've written this refactoring

I'm not in a position to test this at present, but it does compile

use strict;
use warnings 'all';

use LWP::UserAgent;

my @files = glob '*.txt'; # Files containg list of UniProt IDs.

my $base    = 'http://www.uniprot.org';
my $tool    = 'uploadlists';
my $contact = ''; # Please set your email address here
                  # to help us debug in case of problems.

my $agent = LWP::UserAgent->new(agent => "libwww-perl $contact");
push @{$agent->requests_redirectable}, 'POST';

for my $file ( @files ) {

    my $response = $agent->post(
        "$base/$tool/",
        Content_Type => 'form-data',
        Content      => [
            file     => [ $file ],
            format   => 'tab',
            from     => 'ACC+ID',
            to       => 'ACC',
            columns  => 'id,database(ko)',
        ],
    );

    while ( my $wait = $response->header('Retry-After') ) {
        print STDERR "Waiting ($wait)...\n";
        sleep $wait;
        $response = $agent->get($response->base);
    }

    if ( $response->is_success ) {
        print $response->content;
    }
    else {
        die sprintf "Failed. Got %s for %s\n",
            $response->request->uri,
            $response->status_line;
    }
}

Upvotes: 1

Related Questions