Reputation: 61
I am trying to read a column of numbers from a text file that looks like this:
some text and numbers..., then:
q-pt= 1 0.000000 0.000000 0.000000 1.0000000000
1 -0.066408 0.0000000
2 -0.053094 0.0000000
3 -0.037643 0.0000000
...
156 3107.735577 6.8945617
...more text file
I am interested on reading the secound column the one that contain -0.066408, -0.053094 and so on.
The code I have try to write is somehow not doing the job without giving any error.I have tried this:
import re
import sys
from string import atof
from math import exp
from numpy import *
file1 = open('castepfreq.dat', 'w')
with open('xd_geo_Efield.phonon') as file:
File = file.readlines()
p1 = re.compile("q-pt= 1 0.000000 0.000000 0.000000 1.0000000000")
for i in range(len(File)):
m1 = p1.search(File[i])
if m1:
read = int(float(File[i+1][10:23]))
freq = (read)
print >> file1, freq
file1.close()
If anyone can help me with this, it will be great.
Upvotes: 2
Views: 1156
Reputation: 180522
You can split on whitespace and then extract the second elements:
with open('xd_geo_Efield.phonon') as f:
col = [line.split()[1] for line in f]
print(col)
If your input is:
q-pt= 1 0.000000 0.000000 0.000000 1.0000000000
1 -0.066408 0.0000000
2 -0.053094 0.0000000
3 -0.037643 0.0000000
Output will be:
[('1', '-0.066408', '-0.053094', '-0.037643')]
Or using itertools and transposing:
from itertools import izip, islice, imap
with open('xd_geo_Efield.phonon') as f:
col = islice(izip(*imap(str.split,f)), 1,2)
print(list(col))
If you want to cast, cast the value to float:
[float(line.split()[1]) for line in f]
Also if you want to skip the header and ignore 1
call next(f)
on the file object before you use the rest of the code i.e:
with open('xd_geo_Efield.phonon') as f:
next(f)
col = [float(line.split()[1]) for line in f]
print(list(col))
Which would output:
[-0.066408, -0.053094, -0.037643]
If you have data you want to ignore and only start at the line q-pt=..
, you can use itertools.dropwhile to ignore the lines at the start:
from itertools import dropwhile
with open('xd_geo_Efield.phonon') as f:
col = [float(line.split()[1]) for line in dropwhile(
lambda x: not x.startswith("q-pt="), f)]
print(list(col))
If you want to also ignore that line, you can call next again but this time on the dropwhile object:
from itertools import dropwhile
with open('xd_geo_Efield.phonon') as f:
dp = dropwhile(lambda x: not x.startswith("q-pt="), f)
next(dp)
col = [float(line.split()[1]) for line in dp]
print(list(col))
So for the input:
some 1 1 1 1 1
meta 2 2 2 2 2
data 3 3 3 3 3
and 4 4 4 4 4
numbers 5 5 5 5 5
q-pt= 1 0.000000 0.000000 0.000000 1.0000000000
1 -0.066408 0.0000000
2 -0.053094 0.0000000
3 -0.037643 0.0000000
3 -0.037643 0.0000000
The output will be:
[-0.066408, -0.053094, -0.037643, -0.037643]
For leading spaces,lstrip
it off:
from itertools import dropwhile, imap, takewhile
with open('xd_geo_Efield.phonon') as f:
# for python3 just use map
dp = dropwhile(lambda x: not x.startswith("q-pt="), imap(str.lstrip,f))
next(dp)
col = [float(line.split(None,2)[1]) for line in takewhile(lambda x: x.strip() != "", dp)]
print(list(col))
takewhile
will keep taking lines until we hit the first empty lines at the end of the file.
Upvotes: 2