Concatenating Pairs of Files with Specific Naming Scheme (UBUNTU)

Question

**There appears to be a partial answer here, but the user encountered the same problem I did: it's important to preserve the original naming scheme.

Loop to concatenate multiple pairs of files with almost the same name in UNIX**

I have a folder with paired files; the names look like this (all stored in the same folder/directory):

MX_HF20.1.fq.gz; MX_HF20.rem.1.fq.gz

MX_HF22.1.fq.gz; MX_HF22.rem.1.fq.gz

.

SD_F296.1.fq.gz; SD_F296.rem.1.fq.gz

SD_F297.1.fq.gz; SD_F297.rem.1.fq.gz

(Some of you might recognize this as STACKS output!)

Really, I'm just looking to append the contents of the *.1.rem.fq.gz file to the end of the *.1.fq.gz file, keeping the original *.1.fq.gz file's name.

I've toyed around with test files, so I know cat will do this even though the files are .gz. But my bash scripting abilities are poor at best, and working with and storing name variables is a concept I'm still struggling to grasp.

Many thanks!

l&#39;L&#39;l · Accepted Answer

Maybe try using bash rematch:

#!/bin/bash

p='([A-Z]+_[A-Z]+)([0-9]+)\.1\.rem\.fq\.gz'

for f in *.gz; do
    if [[ $f =~ $p ]]; then
        cat "${f}" >> "${BASH_REMATCH[1]}${BASH_REMATCH[2]}.1.fq.gz"
    fi
done

So for example:

SD_F297.1.rem.fq.gz would be appended to SD_F297.1.fq.gz

Concatenating Pairs of Files with Specific Naming Scheme (UBUNTU)

Answers (2)

Related Questions