Reputation: 7124
I have this text file: http://henke.lbl.gov/tmp/xray6286.dat
In which I would like to pull out the energy and transmission columns.
Unfortunately it doesn't have a clear delimiter - the words are separated by a series of spaces.
Running something like
with open('xray6286.dat', 'U') as data:
reader = csv.reader(data, delimiter=' ')
for line in reader:
print line
would result in an output like:
['', 'Cu', 'Density=8.96', 'Thickness=100.', 'microns']
['', 'Photon', 'Energy', '(eV),', 'Transmission']
['', '', '', '', '5000.0', '', '', '', '', '', '0.52272E-07']
['', '', '', '', '5250.0', '', '', '', '', '', '0.42227E-06']
['', '', '', '', '5500.0', '', '', '', '', '', '0.24383E-05']
I can brute force it to give me the values I want with the following code:
import csv
energy = []
transmission = []
with open('xray6286.dat', 'U') as data:
reader = csv.reader(data, delimiter='\n')
for line in reader:
if reader.line_num > 2:
cleaned_line = []
for word in line[0].split(' '):
if word:
cleaned_line.append(word)
energy.append(cleaned_line[0])
transmission.append(cleaned_line[1])
But I was wondering if someone knew a more ..eloquent.. way for achieving this?
Upvotes: 2
Views: 1059
Reputation: 7124
The regex split method can separate the datapoints based on an arbitrary number of spaces.
import re
for word in re.split(r'\s+', line):
print word
Upvotes: 0
Reputation: 5708
You can store the results in a data structure then itterate through it and delete the null entries. @alfasin suggested the best idea though, which is to use filter
Upvotes: 0
Reputation: 53525
Using if word:
is perfectly fine. Another option would be to filter
out the nulls by replacing:
for word in line[0].split(' '):
with:
for word in filter(bool, line[0].split(' ')):
Upvotes: 1