Reputation: 1089
I am new to Python. I use Fortran to generate the data file I wish to read. For many reasons I would like to use python to calculate averages and statistics on the data rather than fortan.
I need to read the entries in the the first three rows as strings, and then the data which begins in the fourth row onwards as numbers. I don't need the first column, but I do need the rest of the columns each as their own arrays.
# Instantaneous properties
# MC_STEP Density Pressure Energy_Total
# (molec/A^3) (bar) (kJ/mol)-Ext
0 0.34130959E-01 0.52255964E+05 0.26562549E+04
10 0.34130959E-01 0.52174646E+05 0.25835710E+04
20 0.34130959E-01 0.52050492E+05 0.25278775E+04
And the data goes on for thousands, and sometimes millions of lines.
I have tried the following, but run into problems since I can't analyze the lists I have made, and I can't seem to convert them to arrays. I would prefer however to just create arrays to begin with, but if I can convert my lists to arrays that would work too. In my method I get an error when i try to use an element in one of the lists, i.e. Energy(i)
with open('nvt_test_1.out.box1.prp1') as f:
Title = f.readline()
Properties = f.readline()
Units = f.readline()
Density = []
Pressure = []
Energy = []
for line in f:
row = line.split()
Density.append(row[1])
Pressure.append(row[2])
Energy.append(row[3])
I appreciate any help!
Upvotes: 0
Views: 2784
Reputation: 169
You can also use the csv module's DictReader to read each row into a dictionary, as follows:
with open('filename', 'r') as f:
reader = csv.DictReader(f, delimiter=r'\s+', fieldnames=('MC_STEP', 'DENSITY', 'PRESSURE', 'ENERGY_TOTAL')
for row in reader:
Density.append(float(row['DENSITY'])
Pressure.append(float(row['PRESSURE'])
Energy.append(float(row['ENERGY_TOTAL'])
Ofcourse this assumes that the file is formatted more like a CSV (that is, no comments). If the file does have comments at the top, you can skip them before initializing the DictReader as follows:
next(f)
Upvotes: 2
Reputation: 31
You can consider a list
in Python like an array
in other languages and it's very optimised. If you have some special needs there is an array type available but rarely used, alternatively the numpy.array
that is designed for scientific computation; you have to install the Numpy package for that.
Before performing calculations cast the string to a float, like in energy.append(float(row[3]))
Maybe do it at once using map
function:
row = map(float, line.split())
Last, as @Hamms said, access the elements by using square brackets e = energy[i]
Upvotes: 2
Reputation: 210842
I would use pandas module for this task:
import pandas as pd
In [9]: df = pd.read_csv('a.csv', delim_whitespace=True,
comment='#', skiprows=3,header=None,
names=['MC_STEP','Density','Pressure','Energy_Total'])
Data Frame:
In [10]: df
Out[10]:
MC_STEP Density Pressure Energy_Total
0 0 0.034131 52255.964 2656.2549
1 10 0.034131 52174.646 2583.5710
2 20 0.034131 52050.492 2527.8775
Average values for all columns:
In [11]: df.mean()
Out[11]:
MC_STEP 10.000000
Density 0.034131
Pressure 52160.367333
Energy_Total 2589.234467
dtype: float64
Upvotes: 2