Ioana Sovago
Ioana Sovago

Reputation: 61

read numbers from a text file in python

I am trying to read a column of numbers from a text file that looks like this:

some text and numbers..., then:

 q-pt=    1    0.000000  0.000000  0.000000      1.0000000000
   1      -0.066408              0.0000000                      
   2      -0.053094              0.0000000                      
   3      -0.037643              0.0000000 
   ...
   156    3107.735577            6.8945617
...more text file

I am interested on reading the secound column the one that contain -0.066408, -0.053094 and so on.
The code I have try to write is somehow not doing the job without giving any error.I have tried this:

import re                                                                            
import sys                                                                           
from string import atof                                                              
from math import exp                                                                 
from numpy import *                                                                  

file1 = open('castepfreq.dat', 'w')                                                  
with open('xd_geo_Efield.phonon') as file:                                           
    File = file.readlines()                                                          
    p1 = re.compile("q-pt=    1    0.000000  0.000000  0.000000      1.0000000000")  
    for i in range(len(File)):                                                       
        m1 = p1.search(File[i])                                                      
          if  m1:                                                                       
            read = int(float(File[i+1][10:23]))      
            freq = (read)                                                            
    print >> file1, freq    
file1.close()

If anyone can help me with this, it will be great.

Upvotes: 2

Views: 1156

Answers (1)

Padraic Cunningham
Padraic Cunningham

Reputation: 180522

You can split on whitespace and then extract the second elements:

with open('xd_geo_Efield.phonon') as f:
    col = [line.split()[1] for line in f]
    print(col)

If your input is:

q-pt=    1    0.000000  0.000000  0.000000      1.0000000000
1      -0.066408              0.0000000
2      -0.053094              0.0000000
3      -0.037643              0.0000000

Output will be:

[('1', '-0.066408', '-0.053094', '-0.037643')]

Or using itertools and transposing:

from itertools import izip, islice, imap
with open('xd_geo_Efield.phonon') as f:
    col = islice(izip(*imap(str.split,f)), 1,2)
    print(list(col))

If you want to cast, cast the value to float:

 [float(line.split()[1]) for line in f]

Also if you want to skip the header and ignore 1 call next(f) on the file object before you use the rest of the code i.e:

with open('xd_geo_Efield.phonon') as f:
      next(f)
      col = [float(line.split()[1]) for line in f]
      print(list(col))

Which would output:

 [-0.066408, -0.053094, -0.037643]

If you have data you want to ignore and only start at the line q-pt=.., you can use itertools.dropwhile to ignore the lines at the start:

from itertools import dropwhile

with open('xd_geo_Efield.phonon') as f:
    col = [float(line.split()[1]) for line in dropwhile(
           lambda x: not x.startswith("q-pt="), f)]
    print(list(col))

If you want to also ignore that line, you can call next again but this time on the dropwhile object:

from itertools import dropwhile

with open('xd_geo_Efield.phonon') as f:
    dp = dropwhile(lambda x: not x.startswith("q-pt="), f)
    next(dp)
    col = [float(line.split()[1]) for line in dp]
    print(list(col))

So for the input:

some 1 1 1 1 1
meta 2 2 2 2 2
data 3 3 3 3 3
and 4 4 4 4 4
numbers 5 5 5 5 5
q-pt=    1    0.000000  0.000000  0.000000      1.0000000000
1      -0.066408              0.0000000
2      -0.053094              0.0000000
3      -0.037643              0.0000000
3      -0.037643              0.0000000

The output will be:

[-0.066408, -0.053094, -0.037643, -0.037643]

For leading spaces,lstrip it off:

from itertools import dropwhile, imap, takewhile

with open('xd_geo_Efield.phonon') as f:
    # for python3 just use map
    dp = dropwhile(lambda x: not x.startswith("q-pt="), imap(str.lstrip,f))
    next(dp)
    col = [float(line.split(None,2)[1]) for line in takewhile(lambda x: x.strip() != "", dp)]
    print(list(col))

takewhile will keep taking lines until we hit the first empty lines at the end of the file.

Upvotes: 2

Related Questions