olliepower
olliepower

Reputation: 1359

Searching files by matching filename pattern and concatenating contents of the files

I have 10 folders which contain files of the formated like so

"xaaNP_len_0.fa"

or

"xaaP_len_0.fa"

the "xaa" part is unique to the folder it is in. the folders are named

[xaa, xab....xaj]

I want to concatenate all the files that match a specific pattern together.

For example I would like to concatenate all the

P_len_*.fa

where * is an integer from 0 to 100. This should not include the case where the "P" has an "N" to the left.

Next I want to concatenate all the

NP_len_*.fa

files together in the same manner across all 10 directories.e

The structure of the files is flat. For example

xaa/xaaNP_len_0.fa
xab/xabNP_len_0.fa

should all got into one file named

NP_len_0.fa

Upvotes: 1

Views: 215

Answers (2)

konsolebox
konsolebox

Reputation: 75558

Try this script:

#!/bin/bash

NEXT=''

for (( ;; )); do
    if [[ -n $NEXT ]]; then
        LINE=$NEXT
        NEXT=''
    else
        read LINE || break
    fi

    FILES=("$LINE")
    FORMAT=${LINE#???/???}

    while read LINE; do
        if [[ $LINE == ???/???"$FORMAT" ]]; then
            FILES+=("$LINE")
        else
            NEXT=$LINE
            break
        fi
    done

    echo "Concatenating ${FILES[*]} to $FORMAT."

    cat "${FILES[@]}" > "$FORMAT"
done < <(
    find xa?/ -mindepth 1 -maxdepth 1 -type f -name '???P_len_*.fa' | sort -k 1.14 -n
    find xa?/ -mindepth 1 -maxdepth 1 -type f -name '???NP_len_*.fa' | sort -k 1.15 -n
)

Upvotes: 1

dirtydexter
dirtydexter

Reputation: 1073

for the second NP_len_*.fa pattern the regex can be like

.+NP_len_\d{1,3}.fa

and for the first one where you do not want the N us this

.+?[^N]P_len_\d{1,3}.fa

this one will match all patterns just except N before P. I have considered that folder names might grow in future about you xaa part. you can alternatively match for string of length 3 also.

Upvotes: 1

Related Questions