Reputation: 109
I'm a beginner in Python and I can't find an answer to my problem. I have a file with some data and I want to get numbers from this file. My program looks like this:
class Mojaklasa:
def przenumeruj_pdb(self):
nazwa=raw_input('Podaj nazwe pliku: ')
plik=open(nazwa).readlines()
write=open('out.txt','w')
for i in plik:
j=i.split()
if len(j)>5:
if j[0] == "ATOM":
write.write(j[5])
write.write("\n")
zapis.close()
The 5th field in file has some numbers from -19 to 100, and it's working perfectly. But sometimes the 5th field has numbers with a letter, f.e. 28A and only want 28. Converting to int doesn't work. How can I do this?
Upvotes: 1
Views: 243
Reputation: 308138
You can replace a lot of logic with a properly formatted regular expression.
for i in plik:
m = re.match(r'ATOM\s+.*?\s+.*?\s+.*?\s+.*?\s+(-?\d+)', i)
if m:
write.write(m.group(1) + '\n')
Upvotes: 1
Reputation: 180401
str.translate will remove any letters:
s = "10A"
from string import ascii_letters
print(int(s.translate(None,ascii_letters)))
10
Or use re:
import re
print(int(re.findall("\-?\d+",s))[0])
10
In [22]: s = "-100A"
In [23]: int(re.findall("\-?\d+",s)[0])
Out[23]: -100
In [24]: int(s.translate(None,ascii_letters))
Out[24]: -100
I would also change your code a bit:
class Mojaklasa: # unless there are more methods I would not use a class
def przenumeruj_pdb(self):
nazwa = raw_input('Podaj nazwe pliku: ')
with open(nazwa) as plik, open("out.txt", "w") as write: # with will close your iles
for line in plik: # iterate over file object
j = line.split()
if len(j) > 5 and j[0] == "ATOM": # same as nested if's
write.write("{}\n".format(j[5].translate(None, ascii_letters)))
Upvotes: 3
Reputation: 881595
Here's an overall-improved approach:
import re
class Mojaklasa:
def przenumeruj_pdb(self):
nazwa=raw_input('Podaj nazwe pliku: ')
with open(nazwa) as plik, open('out.txt','w') as zapis:
for i in plik:
j = i.split()
if len(j) <= 5: continue
if j[0] == "ATOM":
mo = re.match(r'\d+', j[5])
if mo is None: continue
zapis.write(mo.group() + '\n')
I have not improved your choice of identifiers (except for some confusion between write
and zapis
), but improvements include (a) useful indenting, (b) use of with
to open files (so they're automatically closed), (c) extraction of leading digits only via re
(the one most germane to your Q), (d) use of if/continue
lines rather than indenting (as "flat is better than nested"). Plus perhaps some I may be forgetting:-)
I would also recommend renaming j
to line
and i
to fields
(or whatever the equivalents in your chosen language) as i
and j
"sound" a lot like integer loop counters or the like, which is very confusing:-).
Upvotes: 0