Reputation: 571
I am trying to achieve a pattern matching in perl code. I will give an example so it is easier to explain.
I am trying to use the following link in samtools -view:
samtools allows only a specific datasline to be retrieved with the following syntax:
samtools view -h ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00132/alignment/HG00132.mapped.SOLID.bfast.GBR.low_coverage.20111114.bam 1:123-1234
There are 1000 of them and the 'GBR' bit of the link changes. So I wrote a simple perl script and replace the link with 'ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00132/alignment/HG00132.mapped.SOLID.bfast.*.bam'. But it does not recognize the link. So I was wondering if there is similar way of using * in unix which you can use in the middle of the text rather than at the end. So I want to replace GBR with a star and also keep the 'bam' bit at the far end of the file name (I do not need to download the file.)
Thank you in advance
Upvotes: 0
Views: 286
Reputation: 39158
Use LWP to browse FTP. There are no hyper-links, so you have to parse the listing to distinguish among files you want to mirror. Shell globs like *
do not work, but regex are suitable.
Untested example: collecting all bam
URIs from the specified directory.
use File::Listing qw(parse_dir);
use LWP::UserAgent qw();
my @bam_files;
my $base = 'ftp://ftp.1000genomes.ebi.ac.uk/vol1/ftp/data/HG00132/alignment/';
my $ua = LWP::UserAgent->new;
my $index = $ua->get($base);
for my $entry (parse_dir $index->decoded_content) {
my $filename = $entry->[0];
next unless $filename =~ /bam$/;
push @bam_files, $base . $filename;
}
It is impossible to use an FTP file without downloading it first (see method get
in LWP::UserAgent). This does not imply saving it also on the local filesystem (that would be method mirror
).
samtools must do this behind the scenes, too, perhaps using protocol extensions to download only ranges, not the full file.
Upvotes: 1
Reputation: 174624
From wget advanced usage examples:
You want to download all the GIFs from an HTTP directory.
wget http://host/dir/*.gif doesn't work, since HTTP retrieval does not support
globbing. In that case, use:
wget -r -l1 --no-parent -A.gif http://host/dir/
Upvotes: 0