nieka
nieka

Reputation: 269

Is there a fast way in Python to read in data from a file, separated by empty lines?

I want to read in data from a txt file that looks like this:

# input data
50.  310.  3.0E-07  23.06
50.  310.5  2.4E-07  5.73

50.5  310.  2.9E-07  16.30
50.5  310.5  2.2E-07  4.26

51.  310.  2.3E-07  6.40
51.  310.5  2.7E-07  8.19

So I have different blocks, each with a few lines of data and four values, who all end with a blank line.

Currently I read in my data with numpy like this, which gives me an array for every value from all the different blocks.

 x,y,z,err = np.loadtxt(path_to_file, unpack=True)

But in fact I'm really interested to get a list of list for each value, so that I can separate the data from each block, which is indicated through a blank line in the datafile. The results, for example for the third value, should look like this:

# the result i want to achieve
z_list = array([[3.0E-7, 2.4E-07],
   [2.9E-07, 2.2E-07],
   [2.3E-07, 2.7E-07]])

Is there a way in Python or numpy where I read my data and separate it by blank lines?

Upvotes: 0

Views: 830

Answers (2)

Chiheb Nexus
Chiheb Nexus

Reputation: 9257

You can do something like this using groupby from itertools module and literal_eval from ast module.

Assuming your input file is called input_file:

from itertools import groupby
from ast import literal_eval as le

data = []
with open('input_file', 'r') as f:
    data = (k.split() for k in f.read().splitlines())

final = []
for _,v in groupby(data, lambda x: x != []):
    bb = list(v)
    if bb != [[]]:
        final.append([le(k[2]) for k in bb])

print(final)

Output:

[[3e-07, 2.4e-07], [2.9e-07, 2.2e-07], [2.3e-07, 2.7e-07]]

Then, you can convert the final list into a numpy array or something else that'll fill your needs.

Upvotes: 1

user7345804
user7345804

Reputation:

You can achieve the same result without importing external modules.

def read_data(filename):
    """
    filename    :   "/Users/.../Desktop/.../filename.txt"
    """
    datafile = list(open(filename, 'r'))
    # 4 columns in your example 
    col_1, col_2, col_3, col_4 = [], [], [], []
    for col in datafile:
        data = col.split()
        col_1.append(float(data[0])) # 1st column
        col_2.append(float(data[1]))
        col_3.append(float(data[2]))
        col_4.append(float(data[3]))
    return col_1, col_2, col_3, col_4 

Then you could achieve the list of list output like this:

data_1, data_2, data_3, data_4 = read_data(filename)
data = [data_1, data_2, data_3, data_4]

Upvotes: 1

Related Questions