Évariste Galois
Évariste Galois

Reputation: 1033

Python comparing strings within a conditional

I have a text file called dna.txt which contains:

>A
ACG
>B
CCG
>C
CCG
>D
TCA

I want to create a program using Python that will compare all the lines of the text file after the first sequence (ACG) to the first sequence (ACG), and print out "conserved" if the sequences are a match, and "not conserved" if the sequences are a mismatch. I did it using an extremely inefficient way that only goes up to 30 sequences in the file, and I was wondering how maybe a loop could be utilized to simplify this block of code. This is just a short sample of the inefficient method I used:

f = open("dna.txt")
sequence_1 = linecache.getline('dna.txt', 2)
sequence_2 = linecache.getline('dna.txt', 4)
sequence_3 = linecache.getline('dna.txt', 6)
sequence_x = linecache.getline('dna.txt', 2x)
f.close()
if sequence_2 == sequence_1:
    print("Conserved")
else:
    print("Not Conserved")
if sequence_3 == sequence_1:
    print("Conserved")
else:
    print("Not Conserved")
if sequence_x == sequence_1
    print("Conserved")
else:
    print("Not Conserved")

As you can obviously tell, this is probably the worst way of trying to accomplish what I'm trying to do. Help would be much appreciated, thanks!

Upvotes: 0

Views: 69

Answers (2)

TheSoundDefense
TheSoundDefense

Reputation: 6935

A loop would definitely make this more efficient. Here's a possibility:

f = open("dna.txt","r")
sequence_1 = f.readline()
sequence_1 = f.readline()  # Get the actual sequence.
sequence_line = False      # This will switch back and forth to skip every other line.
for line in f:             # Iterate over all remaining lines.
  if sequence_line:        # Only test this every other line.
    if line == sequence_1:
      print("Conserved")
    else:
      print("Not Conserved")
  sequence_line = not sequence_line   # Switch the boolean every iteration.
f.close()

The sequence_line boolean indicates whether we are looking at a sequence line or not. The line sequence_line = not sequence_line will flip it back and forth for every loop iteration, so it's True every other time. That's how we can skip every other line and only compare the ones we care about.

This method may not be as fast as a list comprehension, but it prevents you from storing your entire file in memory, if it's prohibitively large. If you can fit it in memory, Emanuele Paolini's solution is probably going to be quite fast.

Upvotes: 3

Emanuele Paolini
Emanuele Paolini

Reputation: 10162

f = open("dna.txt")
lines = [line for line in f.readlines() if line[0] != '>']
for line in lines[1:]:
  if line == lines[0]:
    print "Conserved"
  else:
    print "Not Conserved"

Upvotes: 1

Related Questions