canfiese
canfiese

Reputation: 61

Concatenating Pairs of Files with Specific Naming Scheme (UBUNTU)

**There appears to be a partial answer here, but the user encountered the same problem I did: it's important to preserve the original naming scheme.

Loop to concatenate multiple pairs of files with almost the same name in UNIX**

I have a folder with paired files; the names look like this (all stored in the same folder/directory):

MX_HF20.1.fq.gz; MX_HF20.rem.1.fq.gz

MX_HF22.1.fq.gz; MX_HF22.rem.1.fq.gz

.

.

.

SD_F296.1.fq.gz; SD_F296.rem.1.fq.gz

SD_F297.1.fq.gz; SD_F297.rem.1.fq.gz

(Some of you might recognize this as STACKS output!)

Really, I'm just looking to append the contents of the *.1.rem.fq.gz file to the end of the *.1.fq.gz file, keeping the original *.1.fq.gz file's name.

I've toyed around with test files, so I know cat will do this even though the files are .gz. But my bash scripting abilities are poor at best, and working with and storing name variables is a concept I'm still struggling to grasp.

Many thanks!

Upvotes: 1

Views: 67

Answers (2)

zakum1
zakum1

Reputation: 1112

It sounds like you are looking for something like this:

#!/bin/bash
for file1 in *.1.fq.gz;  do
   file2=`echo $file1 | sed -E 's/(*\.1)\.fq\.gz/\1.rem.fq.gz/'`
   cat $file1 $file2 > out.$file1
done

Backquotes execute a shell command and insert the text result to that point in your script

sed is the stream editor in unix that manipulates lines of text

It uses regular expressions, and in this case you need () to group and capture the first part of the file name and \1 to reference it

Upvotes: 0

l'L'l
l'L'l

Reputation: 47169

Maybe try using bash rematch:

#!/bin/bash

p='([A-Z]+_[A-Z]+)([0-9]+)\.1\.rem\.fq\.gz'

for f in *.gz; do
    if [[ $f =~ $p ]]; then
        cat "${f}" >> "${BASH_REMATCH[1]}${BASH_REMATCH[2]}.1.fq.gz"
    fi
done

So for example:

SD_F297.1.rem.fq.gz would be appended to SD_F297.1.fq.gz

Upvotes: 1

Related Questions