Reputation: 559
I have a file that has sequence on line 2 and variable called tokenizer, which give me an old position value. I am trying to find the new position.. For example tokenizer for this line give me position 12, which is E by counting letters only until 12. So i need to figure out the new position by counting dashes...
---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------
This is what i have so far it still doesn't work.
with open(filename) as f:
countletter = 0
countdash = 0
for line, line2 in itertools.izip_longest(f, f, fillvalue=''):
tokenizer=line.split()[4]
print tokenizer
for i,character in enumerate(line2):
for countletter <= tokenizer:
if character != '-':
countletter += 1
if character == '-':
countdash +=1
my new position should be 32 for this example
Upvotes: 0
Views: 839
Reputation: 3215
First answer, edited by Chad D to make it 1-indexed (but incorrect):
def get_new_index(string, char_index):
chars = 0
for i, char in enumerate(string):
if char != '-':
chars += 1
if char_index == chars:
return i+1
Rewritten version:
import re
def get(st, char_index):
chars = -1
for i, char in enumerate(st):
if char != '-':
chars += 1
if char_index == chars:
return i
def test():
st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------'
initial = re.sub('-', '', st)
for i, char in enumerate(initial):
print i, char, st[get_1_indexed(st, i)]
def get_1_indexed(st, char_index):
return 1 + get(st, char_index - 1)
def test_1_indexed():
st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------'
initial = re.sub('-', '', st)
for i, char in enumerate(initial):
print i+1, char, st[get_1_indexed(st, i + 1) - 1]
Upvotes: 1
Reputation: 304185
This is a silly way to get the second line, it would be clearer to use an islice, or next(f)
for line, line2 in itertools.izip_longest(f, f, fillvalue=''):
Here count_letter
seems to be an int
while tokenizer
is a str
. Probably not what you expect.
for countletter <= tokenizer:
It's also a syntax error, so I think this isn't the code you are running
Perhaps you should have
tokenizer = int(line.split()[4])
to make tokenizer
into an int
print tokenizer
can be misleading because int
and str
look identical, so you see what you expect to see. Try print repr(tokenizer)
instead when you are debugging.
once you make sure tokenizer is an int, you can change this line
for i,character in enumerate(line2[:tokenizer]):
Upvotes: 0
Reputation: 13196
my original text looks like this and the position i was interested in was 12 which is 'E'
Actually, it's K, assuming you're using zero indexed strings. Python uses zero indexing so unless you're jumping through hoops to 1-index things (and you're not) it will give you K. If you were running into issues, try addressing this.
Here's some code for you that does what you need it to (albeit with 0-indexing, not 1-indexing). This can be found online here:
def get_new_index(oldindex, str):
newindex = 0
for c in str:
if c != '-':
if oldindex == 0:
return newindex
oldindex -= 1
newindex += 1
return 1 / 0 # throw a shitfit if we don't find the index
Upvotes: 0