Ben
Ben

Reputation: 7124

Reading columns from text file without a clear delimiter

I have this text file: http://henke.lbl.gov/tmp/xray6286.dat
In which I would like to pull out the energy and transmission columns.

Unfortunately it doesn't have a clear delimiter - the words are separated by a series of spaces.

Running something like

with open('xray6286.dat', 'U') as data:
reader = csv.reader(data, delimiter=' ')
for line in reader:
    print line

would result in an output like:

['', 'Cu', 'Density=8.96', 'Thickness=100.', 'microns']
['', 'Photon', 'Energy', '(eV),', 'Transmission']
['', '', '', '', '5000.0', '', '', '', '', '', '0.52272E-07']
['', '', '', '', '5250.0', '', '', '', '', '', '0.42227E-06']
['', '', '', '', '5500.0', '', '', '', '', '', '0.24383E-05']

I can brute force it to give me the values I want with the following code:

import csv

energy = []
transmission = []

with open('xray6286.dat', 'U') as data:
    reader = csv.reader(data, delimiter='\n')
    for line in reader:
        if reader.line_num > 2:
            cleaned_line = []
            for word in line[0].split(' '):
                if word:
                    cleaned_line.append(word)
            energy.append(cleaned_line[0])
            transmission.append(cleaned_line[1])

But I was wondering if someone knew a more ..eloquent.. way for achieving this?

Upvotes: 2

Views: 1059

Answers (3)

Ben
Ben

Reputation: 7124

The regex split method can separate the datapoints based on an arbitrary number of spaces.

import re

for word in re.split(r'\s+', line):
    print word

Upvotes: 0

Luke
Luke

Reputation: 5708

You can store the results in a data structure then itterate through it and delete the null entries. @alfasin suggested the best idea though, which is to use filter

Upvotes: 0

Nir Alfasi
Nir Alfasi

Reputation: 53525

Using if word: is perfectly fine. Another option would be to filter out the nulls by replacing:

for word in line[0].split(' '):

with:

for word in filter(bool, line[0].split(' ')):

Upvotes: 1

Related Questions