RomainL.
RomainL.

Reputation: 1014

bash substitution after glob not working?

I encounter a strange behaviour with bash string substitution.

I expected the same substitution on $r1 and $var to yield the exact same results. both strings seem to have the same value.

But It is not the case and I can't understand what I am missing....

maybe is because of the glob? I just don't know... I am not pure IT guys and maybe it's something that will be evident for you.

(bottom a Repl.it link)

mkdir -p T21805
touch T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz

r1=T21805/*R1*
echo $r1;
echo ${r1%%_S1*z}
var=T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz
echo ${var%%_S1*z}

echo $r1| hexdump -C
echo $var | hexdump -C

output :

echo $r1

T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz

echo ${r1%%_S1*z}

T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz

echo ${var%%_S1*z}

T21805/T21805_SI-GA-D8-BH25N7DSXY

echo $r1| hexdump -C

00000000 54 32 31 38 30 35 2f 54 32 31 38 30 35 5f 53 49 |T21805/T21805_SI|

00000010 2d 47 41 2d 44 38 2d 42 48 32 35 4e 37 44 53 58 |-GA-D8-BH25N7DSX|

00000020 59 5f 53 31 5f 4c 30 30 31 5f 52 31 5f 30 30 31 |Y_S1_L001_R1_001|

00000030 2e 66 61 73 74 71 2e 67 7a 0a
|.fastq.gz.| 0000003a

echo $var | hexdump -C

00000000 54 32 31 38 30 35 2f 54 32 31 38 30 35 5f 53 49 |T21805/T21805_SI|

00000010 2d 47 41 2d 44 38 2d 42 48 32 35 4e 37 44 53 58 |-GA-D8-BH25N7DSX|

00000020 59 5f 53 31 5f 4c 30 30 31 5f 52 31 5f 30 30 31 |Y_S1_L001_R1_001|

00000030 2e 66 61 73 74 71 2e 67 7a 0a
|.fastq.gz.| 0000003a

Repl.it

I am interested on understanding why this is not working, I can achieve my desire output using sed for example.

Upvotes: 0

Views: 69

Answers (2)

akira ejiri
akira ejiri

Reputation: 123

I ran it after set -xv to see the contents of r1.

$ r1=T21805/*R1*
+ r1='T21805/*R1*'

$ var=T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz
+ var=T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz

The r1 of$ {r1 %% _ S1 * z}isT21805 / * R1 *.

r1 does not include_S1 * z.

Upvotes: 2

chepner
chepner

Reputation: 531325

Glob expansion doesn't happen at assignment time.

$ mkdir -p T21805
$ touch T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz
$ touch T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_002.fastq.gz
$ r1=T21805/*R1*
$ printf '%s\n' "$r1"
T21805/*R1*
$ printf '%s\n' $r1
T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz
T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_002.fastq.gz

It happens after the unquoted r1 has been expanded. When you write ${r1%%_S1*z}, the value of r1 doesn't contain the string S1; only after ${r1} expands is there an S1 you could match against.

If you set an array, the assignment rules are different. The glob expands before the assignment, and so you can do your filtering on each element of the array.

$ r1=( T21805/*R1* )
$ printf '%2\n' "${r1[@]}"
T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_001.fastq.gz
T21805/T21805_SI-GA-D8-BH25N7DSXY_S1_L001_R1_002.fastq.gz
$ printf '%s\n' "${r1[@]%%_S1*z}"
T21805/T21805_SI-GA-D8-BH25N7DSXY
T21805/T21805_SI-GA-D8-BH25N7DSXY

Upvotes: 2

Related Questions