nad87563
nad87563

Reputation: 4072

regex for finding file extension

I am using below regex in my script to read files ending of type _L001_R1_001.fastq or _L001_R2_001.fastq

if it is R1 it should be read into readPair_1 and if R2 it should be read into readPair_2 but its not matching anything.

can anyone please tell me what is wrong here?

My script:

#! /bin/bash -l

Proj_Dir="${se_ProjDir}/*.fastq"

for Dir in $Proj_Dir
do

        if [[ "$Dir" =~ _L.*_R1_001.fastq]]
        then

            readPair_1=$Dir
            echo $readPair_1

        fi
        if [[ "$Dir" =~ _L.*_R2_001.fastq]]
        then

            readPair_2=$Dir
            echo $readPair_2

        fi

Files:

Next-ID-1-MN-SM5144-170509-ABC_S1_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S1_L001_R2_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S2_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S2_L001_R2_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S3_L001_R1_001.fastq
Next-ID-1-MN-SM5144-170509-ABC_S3_L001_R2_001.fastq

Upvotes: 0

Views: 223

Answers (3)

JooMing
JooMing

Reputation: 932

The regular expression for =~ operator must match the whole string. Therefore you should modify your regular expression in if statements as follows: .*_L.*_R1_001.fastq and .*_L.*_R2_001.fastq.

Upvotes: 0

Jack
Jack

Reputation: 6158

You need .gz at the end of your pattern. You're not getting any files at all:

Proj_Dir="${se_ProjDir}/*.fastq.gz"

You also need spaces before ]]:

if [[ "$Dir" =~ _L.*_R1_001.fastq ]]

and

if [[ "$Dir" =~ _L.*_R1_002.fastq ]]

Upvotes: 1

David Maddox
David Maddox

Reputation: 2104

Try:

L001_R[12]_001\.fastq\.gz$

This will look for either the R1 or R2 files, and ensure that that's how the filename string ends.

Upvotes: 0

Related Questions