brium-brium
brium-brium

Reputation: 837

Python read .txt-files header

I need to read some information from a txt file header which looks like this:

Date    20160122
SP Number   8
Gauge   250N internal
Total Height    61
SP Modell   SP2
Corner Distance 150 

Height  Value   Comment
60  NaN 
...

I have a python program program currently doing this:

depth, N = npy.loadtxt(filename, skiprows=8, unpack=True, usecols = usecols)

However I would like to read out some of the values from the header. Is there a way to do this? I am mostly interested to get the value of "Total Height". On my search I only seem to find answers concerning .csv files.

Upvotes: 4

Views: 25621

Answers (4)

Greg
Greg

Reputation: 251

You can do it with re module, if keys in header file are always the same:

file = open(filename, 'r')
data = file.read()
Total = re.findall( 'Total Height\s*([0-9]+)\s*\n', data)[0]

I had the same task to read the header file and did it with re module

Upvotes: 0

jez
jez

Reputation: 15359

This will do it, but with one caveat:

import numpy as npy

usecols = (0, 1)

header = {}
with open(filename, 'rt') as f:
    for header_lines, line in enumerate(f):
        line = line.strip()
        if not line: break # assume that a blank line means "header stops here"
        key, value = line.split(' ', 1)
        header[key] = value


depth, N = npy.loadtxt(filename, skiprows=header_lines + 2, unpack=True, usecols=usecols)

The problem is that the header format has ambiguity about what is key and what is value. Some keys appear to be multiple space-delimited words, and some values are too, and yet (a non-deterministic amount of) whitespace is also apparently the only rule for separating key from value. In most cases it's 3 spaces between key and value, but Corner Distance is only followed by 1 space—it's therefore ambiguous (except to the human brain's own sophisticated context parser) where the key ends and value begins.

Maybe the problem is just bad rendering (on this page, or during copy-paste into SO) of what are really supposed to be tabs. If so,

        key, value = line.split('\t', 1)

will solve the problem. But if not, the ambiguity in the file format needs to be resolved before a definitive solution can be written.

Upvotes: 0

eric.christensen
eric.christensen

Reputation: 3241

I would use open rather than npy.loadtxt

with open(filename, 'r') as the_file:
    all_data = [line.strip() for line in the_file.readlines()]
    height_line = all_data[3]
    data = all_data[8:]

Then you can parse the value of height_line, to get the Total Height. And all your data from the file will be in the variable data.

Upvotes: 3

Christopher Shroba
Christopher Shroba

Reputation: 7574

This should work!

field = "Total Height"

# Get first 6 lines
with open(filename) as file:
    lines = [next(file) for x in range(6)]

value = None
for line in lines:
    if line.startswith(field):
        # Get the part of the string after the field name
        end_of_string = line[len(field):]

        # Convert it to an int:
        value = int(end_of_string.strip())

print(value) #Should print 61

If you know that the field names and values are separated by a tab character instead of spaces, you could instead use line.split('\t') to break each line into the field name and field value, and then just check if field_name is the field you care about, and if so, use the value, instead of using startswith and then slicing the resulting string to get the end of it.

Upvotes: 1

Related Questions