Reputation: 1359
I have 10 folders which contain files of the formated like so
"xaaNP_len_0.fa"
or
"xaaP_len_0.fa"
the "xaa" part is unique to the folder it is in. the folders are named
[xaa, xab....xaj]
I want to concatenate all the files that match a specific pattern together.
For example I would like to concatenate all the
P_len_*.fa
where * is an integer from 0 to 100. This should not include the case where the "P" has an "N" to the left.
Next I want to concatenate all the
NP_len_*.fa
files together in the same manner across all 10 directories.e
The structure of the files is flat. For example
xaa/xaaNP_len_0.fa
xab/xabNP_len_0.fa
should all got into one file named
NP_len_0.fa
Upvotes: 1
Views: 215
Reputation: 75558
Try this script:
#!/bin/bash
NEXT=''
for (( ;; )); do
if [[ -n $NEXT ]]; then
LINE=$NEXT
NEXT=''
else
read LINE || break
fi
FILES=("$LINE")
FORMAT=${LINE#???/???}
while read LINE; do
if [[ $LINE == ???/???"$FORMAT" ]]; then
FILES+=("$LINE")
else
NEXT=$LINE
break
fi
done
echo "Concatenating ${FILES[*]} to $FORMAT."
cat "${FILES[@]}" > "$FORMAT"
done < <(
find xa?/ -mindepth 1 -maxdepth 1 -type f -name '???P_len_*.fa' | sort -k 1.14 -n
find xa?/ -mindepth 1 -maxdepth 1 -type f -name '???NP_len_*.fa' | sort -k 1.15 -n
)
Upvotes: 1
Reputation: 1073
for the second NP_len_*.fa
pattern the regex
can be like
.+NP_len_\d{1,3}.fa
and for the first one where you do not want the N
us this
.+?[^N]P_len_\d{1,3}.fa
this one will match all patterns just except N
before P
. I have considered that folder names might grow in future about you xaa
part. you can alternatively match for string of length 3 also.
Upvotes: 1