Reputation: 425
I am trying to extract certain lines out of a file if they match certain criteria. Specifically, column [3] needs to start with Chr3:, and column [13] needs to be "yes".
Here are examples of lines that match and do not match the criteria:
XLOC_004170 XLOC_004170 - Ch3:14770-25031 SC_JR32_Female SC_JR32_Male OK 55.8796 9.2575 -2.59363 -0.980118 0.49115 0.897554 no XLOC_004387 XLOC_004387 - Ch3:3072455-3073591 SC_JR32_Female SC_JR32_Male OK 0 35.4535 inf -nan 5e-05 0.0149954 yes
The python script I am using is:
with open(input_file) as fp: # fp is the file handle
for line in fp: #line is the iterator
line=line.split("\t")
locus = str(line[3])
significance = str(line[13])
print(locus)
print(significance)
if (re.match('Chr3:[0-9]+-[0-9]+',locus,flags=0) and re.match('yes',significance,flags=0)):
output.write(("%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\t%s\n")%(line[0],line[1],line[2],line[3],line[4],line[5],line[6],line[7],line[8],line[9],line[10],line[11],line[12],line[13]))
I would really be grateful if anyone could explain why this script returns no outputs.
Upvotes: 0
Views: 95
Reputation: 473823
You don't need regex for such simple checks. Better use startswith()
and ==
:
if locus.startswith('Chr3:') and significance == 'yes':
UPD:
You need to apply strip()
on locus
and significance
variables before the if condition:
locus = str(line[3]).strip()
significance = str(line[13]).strip()
Upvotes: 3
Reputation: 298106
There's really no reason to use regex here:
with open(input_file) as handle:
for line in handle:
cells = line.split('\t')
locus = cells[2]
significance = cells[12]
if locus.startswith('Ch3:') and significance == 'yes':
output.write('\t'.join(cells) + '\n')
Upvotes: 3