Reputation: 313
I have a folder full of files whose names look like this:
"Code1_B1_1.1.fq.gz"
"Code1_B1_2.2.fq.gz"
"Code1_B2_1.1.fq.gz"
"Code1_B2_2.2.fq.gz"
...
"Code5_B1_1.1.fq.gz"
"Code5_B1_2.2.fq.gz"
"Code5_B2_1.1.fq.gz"
...
...
etc.
These are DNA sequences. I want to concatenate these files according to the Code number AND the extension. Thus, for example, my files "Code1_B1_1.1.fq.gz" and "Code1_B2_1.1.fq.gz" will be merged in a single "Code1_both_1.1.fq.gz".
Using bash (as a novice), I found out how to list the files I need to concatenate, for example :
ls | grep -E "Code1.*.1.1.fq.gz"
but how can I concatenate them afterwards ? I wanted to simply use the command -cat and save the output into a new file, but how do I retrieve the files I was able to list with -ls ?
... also, ultimately, I would like to perform the whole thing from a Python script that would automatically merge all my files according my two criteria (Code and extension) :)
Thank you in advance for your help!
Chrys
Upvotes: 1
Views: 256
Reputation: 65
Try to list all files and then grep for the files you want and store it in a file.
ls -ltra | egrep -e 'Code1_B1_1.1.fq.gz|Code1_B1_2.2.fq.gz|Code1_B2_1.1.fq.gz|Code1_B2_2.2.fq.gz' > filename
OR
ls | zip -@m filename.zip
Upvotes: 0
Reputation: 295383
ls
output is for human use, not programmatic consumption; see Why you shouldn't parse the output of ls
.
Instead, use a glob expression to form a list of filenames:
zcat Code1*1.1.fq.gz >outfile
...or...
gunzip -c Code1*1.1.fq.gz >outfile
If you need to quote parts of this name for some reason, you can do that so long as you don't quote the *
(or any other glob-expression metacharacter):
gunzip -c "Code1"*"1.1.fq.gz"
Note that glob expressions are a bit different from regular expressions: In regex, .
is a special character -- so grep -E "Code1.*.1.1.fq.gz"
would also match Code1AB1C1DfqEgz
as a valid name, since each and every .
in the expression is treated that way. In globs, .
is not special, and *
means zero-or-more-of-anything (as opposed to zero-or-more-of-the-last-thing)
Upvotes: 1