Chad D
Chad D

Reputation: 559

How to process character by character in a line

I have a file that has sequence on line 2 and variable called tokenizer, which give me an old position value. I am trying to find the new position.. For example tokenizer for this line give me position 12, which is E by counting letters only until 12. So i need to figure out the new position by counting dashes...

---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------

This is what i have so far it still doesn't work.

with open(filename) as f:
    countletter = 0
    countdash = 0
    for line, line2 in itertools.izip_longest(f, f, fillvalue=''):
        tokenizer=line.split()[4]
        print tokenizer

        for i,character in enumerate(line2):

            for countletter <= tokenizer:

                if character != '-': 
                    countletter += 1
                if character == '-':
                    countdash +=1

my new position should be 32 for this example

Upvotes: 0

Views: 839

Answers (3)

Brenden Brown
Brenden Brown

Reputation: 3215

First answer, edited by Chad D to make it 1-indexed (but incorrect):

def get_new_index(string, char_index):
    chars = 0
    for i, char in enumerate(string):
        if char != '-':
            chars += 1
        if char_index == chars:
            return i+1

Rewritten version:

import re

def get(st, char_index):
    chars = -1
    for i, char in enumerate(st):
        if char != '-':
            chars += 1
        if char_index == chars:
            return i

def test():
    st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------'
    initial = re.sub('-', '', st)
    for i, char in enumerate(initial):
        print i, char, st[get_1_indexed(st, i)]

def get_1_indexed(st, char_index):
    return 1 + get(st, char_index - 1)

def test_1_indexed():
    st = '---------------LL---NE--HVKTHTEEK---PF-ICTVCR-KS----------'
    initial = re.sub('-', '', st)
    for i, char in enumerate(initial):
        print i+1, char, st[get_1_indexed(st, i + 1) - 1]

Upvotes: 1

John La Rooy
John La Rooy

Reputation: 304185

This is a silly way to get the second line, it would be clearer to use an islice, or next(f)

for line, line2 in itertools.izip_longest(f, f, fillvalue=''):

Here count_letter seems to be an int while tokenizer is a str. Probably not what you expect.

    for countletter <= tokenizer:

It's also a syntax error, so I think this isn't the code you are running

Perhaps you should have

tokenizer = int(line.split()[4]) 

to make tokenizer into an int

print tokenizer can be misleading because int and str look identical, so you see what you expect to see. Try print repr(tokenizer) instead when you are debugging.

once you make sure tokenizer is an int, you can change this line

    for i,character in enumerate(line2[:tokenizer]):

Upvotes: 0

Wug
Wug

Reputation: 13196

my original text looks like this and the position i was interested in was 12 which is 'E'

Actually, it's K, assuming you're using zero indexed strings. Python uses zero indexing so unless you're jumping through hoops to 1-index things (and you're not) it will give you K. If you were running into issues, try addressing this.

Here's some code for you that does what you need it to (albeit with 0-indexing, not 1-indexing). This can be found online here:

def get_new_index(oldindex, str):
    newindex = 0

    for c in str:
        if c != '-':
            if oldindex == 0:
                return newindex
            oldindex -= 1
        newindex += 1

    return 1 / 0 # throw a shitfit if we don't find the index

Upvotes: 0

Related Questions