Reputation: 1131
I have lets say 50 folders each with a different number of pairs of files that are the input for a command line tool.
#for f in ./*shuf; do #lists all the directories
#FILES=${f}/*.fastq #to get all the fastq files in the directory
FILES="./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq"
What I need to do is divide the files into their respective pairs (one r and one f for each file name), into something that looks like this (for a single pair):
echo $PAIR
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq
I will use this as an input which needs to be in this format
(`basename ${PAIR%_*}; $PAIR`):
C115_7.121017_1 ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq
And then loop through all the pairs.
I was attempting to do this with:
IFS=' ' read -ra ADDR <<< "$FILES"
echo "${ADDR[ ]}"
but I'm stuck getting an error ${ADDR[ ]}: bad substitution
. Could you please include an explanation of the method as I really want to learn.
EDIT:
To clarify a bit:
this is somewhat what I am looking for the output to be:
IFS=' ' read -ra ADDR <<< "$FILES"
pairs="${ADDR[@]}"
for afile in ${pairs}; do bfile=${afile%_*}; echo ${bfile}_r.fastq ${bfile}_f.fastq; done
But without the duplicating:
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121017_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121103_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq
./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_r.fastq ./74.C115_7.merge.align.rg.sorted.rmdup.shuf/C115_7.121214_1_f.fastq
Upvotes: 0
Views: 177
Reputation: 75458
shopt -s nullglob
KEYS=()
declare -A MAP=()
for D in ./*shuf; do
for F in "$D"/*.fastq; do
KEY=${F##*/} KEY=${KEY%_*}
[[ -z ${MAP[$KEY]} ]] && KEYS+=("$KEY")
MAP[$KEY]+=" $F"
done
for KEY in "${KEYS[@]}"; do
echo "${KEY}${MAP[$KEY]}"
done
KEYS=()
MAP=()
done
Or
shopt -s nullglob
KEYS=()
declare -A MAP=()
for D in ./*shuf; do
for F in "$D"/*.fastq; do
KEY=${F##*/} KEY=${KEY%_*}
[[ -z ${MAP[$KEY]} ]] && KEYS+=("$KEY")
MAP[$KEY]+=" $F"
done
done
for KEY in "${KEYS[@]}"; do
echo "${KEY}${MAP[$KEY]}"
done
You need Bash 4.0 or newer for it. Good luck.
Upvotes: 1
Reputation: 295291
for f in *shuf; do
files=( "$f"/*.fastq ) # an array of files, NOT a string
for file in "${files[@]}"; do # expands each element into a separate parameter
# write output; note that this is DANGEROUS because it's newline-terminating
# ...filenames which can potentially themselves contain newlines.
printf '%s %s\n' "$(basename "${file%_*}")" "$file"
done
done
Upvotes: 0