gwilymh
gwilymh

Reputation: 425

RegEx in Python not returning matches

I am trying to extract certain lines out of a file if they match certain criteria. Specifically, column [3] needs to start with Chr3:, and column [13] needs to be "yes".

Here are examples of lines that match and do not match the criteria:

XLOC_004170   XLOC_004170 -   Ch3:14770-25031 SC_JR32_Female  SC_JR32_Male    OK  55.8796 9.2575  -2.59363    -0.980118   0.49115 0.897554    no
XLOC_004387   XLOC_004387 -   Ch3:3072455-3073591 SC_JR32_Female  SC_JR32_Male    OK  0   35.4535 inf -nan    5e-05   0.0149954   yes

The python script I am using is:

with open(input_file) as fp: # fp is the file handle
    for line in fp: #line is the iterator
        line=line.split("\t")
        locus = str(line[3])
        significance = str(line[13])
        print(locus)
        print(significance)

        if (re.match('Chr3:[0-9]+-[0-9]+',locus,flags=0) and re.match('yes',significance,flags=0)):
            output.write(("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n")%(line[0],line[1],line[2],line[3],line[4],line[5],line[6],line[7],line[8],line[9],line[10],line[11],line[12],line[13]))

I would really be grateful if anyone could explain why this script returns no outputs.

Upvotes: 0

Views: 95

Answers (2)

alecxe
alecxe

Reputation: 473823

You don't need regex for such simple checks. Better use startswith() and ==:

if locus.startswith('Chr3:') and significance == 'yes':

UPD: You need to apply strip() on locus and significance variables before the if condition:

locus = str(line[3]).strip()
significance = str(line[13]).strip()

Upvotes: 3

Blender
Blender

Reputation: 298106

There's really no reason to use regex here:

with open(input_file) as handle:
    for line in handle:
        cells = line.split('\t')

        locus = cells[2]
        significance = cells[12]

        if locus.startswith('Ch3:') and significance == 'yes':
            output.write('\t'.join(cells) + '\n')

Upvotes: 3

Related Questions