user3541098
user3541098

Reputation: 109

Converting str to int in Python, only numbers w/o characters

I'm a beginner in Python and I can't find an answer to my problem. I have a file with some data and I want to get numbers from this file. My program looks like this:

class Mojaklasa:
def przenumeruj_pdb(self):
    nazwa=raw_input('Podaj nazwe pliku: ')
    plik=open(nazwa).readlines()
    write=open('out.txt','w')
    for i in plik:
        j=i.split()
        if len(j)>5:
            if j[0] == "ATOM":
                    write.write(j[5])
                    write.write("\n")
    zapis.close()

The 5th field in file has some numbers from -19 to 100, and it's working perfectly. But sometimes the 5th field has numbers with a letter, f.e. 28A and only want 28. Converting to int doesn't work. How can I do this?

Upvotes: 1

Views: 243

Answers (3)

Mark Ransom
Mark Ransom

Reputation: 308138

You can replace a lot of logic with a properly formatted regular expression.

for i in plik:
    m = re.match(r'ATOM\s+.*?\s+.*?\s+.*?\s+.*?\s+(-?\d+)', i)
    if m:
        write.write(m.group(1) + '\n')

Upvotes: 1

Padraic Cunningham
Padraic Cunningham

Reputation: 180401

str.translate will remove any letters:

s = "10A"
from string import ascii_letters
print(int(s.translate(None,ascii_letters)))
10

Or use re:

import re
print(int(re.findall("\-?\d+",s))[0])
10
In [22]: s = "-100A"   
In [23]: int(re.findall("\-?\d+",s)[0])
Out[23]: -100

In [24]: int(s.translate(None,ascii_letters))
Out[24]: -100

I would also change your code a bit:

class Mojaklasa: # unless there are more methods I would not use a class 
    def przenumeruj_pdb(self):
        nazwa = raw_input('Podaj nazwe pliku: ')
        with open(nazwa) as plik, open("out.txt", "w") as write: # with will close your iles 
            for line in plik: # iterate over file object
                j = line.split()
                if len(j) > 5 and j[0] == "ATOM": # same as nested if's
                    write.write("{}\n".format(j[5].translate(None, ascii_letters)))

Upvotes: 3

Alex Martelli
Alex Martelli

Reputation: 881595

Here's an overall-improved approach:

import re

class Mojaklasa:
    def przenumeruj_pdb(self):
        nazwa=raw_input('Podaj nazwe pliku: ')
        with open(nazwa) as plik, open('out.txt','w') as zapis:
            for i in plik:
                j = i.split()
                if len(j) <= 5: continue
                if j[0] == "ATOM":
                    mo = re.match(r'\d+', j[5])
                    if mo is None: continue
                    zapis.write(mo.group() + '\n')

I have not improved your choice of identifiers (except for some confusion between write and zapis), but improvements include (a) useful indenting, (b) use of with to open files (so they're automatically closed), (c) extraction of leading digits only via re (the one most germane to your Q), (d) use of if/continue lines rather than indenting (as "flat is better than nested"). Plus perhaps some I may be forgetting:-)

I would also recommend renaming j to line and i to fields (or whatever the equivalents in your chosen language) as i and j "sound" a lot like integer loop counters or the like, which is very confusing:-).

Upvotes: 0

Related Questions