Charlie Crown
Charlie Crown

Reputation: 1089

Reading columns of data into arrays in Python

I am new to Python. I use Fortran to generate the data file I wish to read. For many reasons I would like to use python to calculate averages and statistics on the data rather than fortan.

I need to read the entries in the the first three rows as strings, and then the data which begins in the fourth row onwards as numbers. I don't need the first column, but I do need the rest of the columns each as their own arrays.

# Instantaneous properties
# MC_STEP              Density          Pressure      Energy_Total
#                  (molec/A^3)             (bar)      (kJ/mol)-Ext
       0    0.34130959E-01    0.52255964E+05    0.26562549E+04
      10    0.34130959E-01    0.52174646E+05    0.25835710E+04
      20    0.34130959E-01    0.52050492E+05    0.25278775E+04

And the data goes on for thousands, and sometimes millions of lines.

I have tried the following, but run into problems since I can't analyze the lists I have made, and I can't seem to convert them to arrays. I would prefer however to just create arrays to begin with, but if I can convert my lists to arrays that would work too. In my method I get an error when i try to use an element in one of the lists, i.e. Energy(i)

with open('nvt_test_1.out.box1.prp1') as f:
    Title = f.readline()
    Properties = f.readline()
    Units = f.readline()
    Density = []
    Pressure = []
    Energy = []
    for line in f:
        row = line.split()
        Density.append(row[1])
        Pressure.append(row[2])
        Energy.append(row[3])

I appreciate any help!

Upvotes: 0

Views: 2784

Answers (3)

sir_snoopalot
sir_snoopalot

Reputation: 169

You can also use the csv module's DictReader to read each row into a dictionary, as follows:

with open('filename', 'r') as f:
    reader = csv.DictReader(f, delimiter=r'\s+', fieldnames=('MC_STEP', 'DENSITY', 'PRESSURE', 'ENERGY_TOTAL')
    for row in reader:
        Density.append(float(row['DENSITY'])
        Pressure.append(float(row['PRESSURE'])
        Energy.append(float(row['ENERGY_TOTAL'])

Ofcourse this assumes that the file is formatted more like a CSV (that is, no comments). If the file does have comments at the top, you can skip them before initializing the DictReader as follows:

next(f)

Upvotes: 2

dido
dido

Reputation: 31

You can consider a list in Python like an array in other languages and it's very optimised. If you have some special needs there is an array type available but rarely used, alternatively the numpy.array that is designed for scientific computation; you have to install the Numpy package for that.

Before performing calculations cast the string to a float, like in energy.append(float(row[3]))

Maybe do it at once using map function:

row = map(float, line.split())

Last, as @Hamms said, access the elements by using square brackets e = energy[i]

Upvotes: 2

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

I would use pandas module for this task:

import pandas as pd

In [9]: df = pd.read_csv('a.csv', delim_whitespace=True, 
                         comment='#', skiprows=3,header=None,
                         names=['MC_STEP','Density','Pressure','Energy_Total'])

Data Frame:

In [10]: df
Out[10]:
   MC_STEP   Density   Pressure  Energy_Total
0        0  0.034131  52255.964     2656.2549
1       10  0.034131  52174.646     2583.5710
2       20  0.034131  52050.492     2527.8775

Average values for all columns:

In [11]: df.mean()
Out[11]:
MC_STEP            10.000000
Density             0.034131
Pressure        52160.367333
Energy_Total     2589.234467
dtype: float64

Upvotes: 2

Related Questions