Reputation: 61
**There appears to be a partial answer here, but the user encountered the same problem I did: it's important to preserve the original naming scheme.
Loop to concatenate multiple pairs of files with almost the same name in UNIX**
I have a folder with paired files; the names look like this (all stored in the same folder/directory):
MX_HF20.1.fq.gz; MX_HF20.rem.1.fq.gz
MX_HF22.1.fq.gz; MX_HF22.rem.1.fq.gz
.
.
.
SD_F296.1.fq.gz; SD_F296.rem.1.fq.gz
SD_F297.1.fq.gz; SD_F297.rem.1.fq.gz
(Some of you might recognize this as STACKS output!)
Really, I'm just looking to append the contents of the *.1.rem.fq.gz file to the end of the *.1.fq.gz file, keeping the original *.1.fq.gz file's name.
I've toyed around with test files, so I know cat will do this even though the files are .gz. But my bash scripting abilities are poor at best, and working with and storing name variables is a concept I'm still struggling to grasp.
Many thanks!
Upvotes: 1
Views: 67
Reputation: 1112
It sounds like you are looking for something like this:
#!/bin/bash
for file1 in *.1.fq.gz; do
file2=`echo $file1 | sed -E 's/(*\.1)\.fq\.gz/\1.rem.fq.gz/'`
cat $file1 $file2 > out.$file1
done
Backquotes execute a shell command and insert the text result to that point in your script
sed is the stream editor in unix that manipulates lines of text
It uses regular expressions, and in this case you need () to group and capture the first part of the file name and \1 to reference it
Upvotes: 0
Reputation: 47169
Maybe try using bash rematch:
#!/bin/bash
p='([A-Z]+_[A-Z]+)([0-9]+)\.1\.rem\.fq\.gz'
for f in *.gz; do
if [[ $f =~ $p ]]; then
cat "${f}" >> "${BASH_REMATCH[1]}${BASH_REMATCH[2]}.1.fq.gz"
fi
done
So for example:
SD_F297.1.rem.fq.gz
would be appended to SD_F297.1.fq.gz
Upvotes: 1