Reputation: 837
I need to read some information from a txt file header which looks like this:
Date 20160122
SP Number 8
Gauge 250N internal
Total Height 61
SP Modell SP2
Corner Distance 150
Height Value Comment
60 NaN
...
I have a python program program currently doing this:
depth, N = npy.loadtxt(filename, skiprows=8, unpack=True, usecols = usecols)
However I would like to read out some of the values from the header. Is there a way to do this? I am mostly interested to get the value of "Total Height". On my search I only seem to find answers concerning .csv files.
Upvotes: 4
Views: 25621
Reputation: 251
You can do it with re
module, if keys in header file are always the same:
file = open(filename, 'r')
data = file.read()
Total = re.findall( 'Total Height\s*([0-9]+)\s*\n', data)[0]
I had the same task to read the header file and did it with re
module
Upvotes: 0
Reputation: 15359
This will do it, but with one caveat:
import numpy as npy
usecols = (0, 1)
header = {}
with open(filename, 'rt') as f:
for header_lines, line in enumerate(f):
line = line.strip()
if not line: break # assume that a blank line means "header stops here"
key, value = line.split(' ', 1)
header[key] = value
depth, N = npy.loadtxt(filename, skiprows=header_lines + 2, unpack=True, usecols=usecols)
The problem is that the header format has ambiguity about what is key and what is value. Some keys appear to be multiple space-delimited words, and some values are too, and yet (a non-deterministic amount of) whitespace is also apparently the only rule for separating key from value. In most cases it's 3 spaces between key and value, but Corner Distance
is only followed by 1 space—it's therefore ambiguous (except to the human brain's own sophisticated context parser) where the key ends and value begins.
Maybe the problem is just bad rendering (on this page, or during copy-paste into SO) of what are really supposed to be tabs. If so,
key, value = line.split('\t', 1)
will solve the problem. But if not, the ambiguity in the file format needs to be resolved before a definitive solution can be written.
Upvotes: 0
Reputation: 3241
I would use open
rather than npy.loadtxt
with open(filename, 'r') as the_file:
all_data = [line.strip() for line in the_file.readlines()]
height_line = all_data[3]
data = all_data[8:]
Then you can parse the value of height_line
, to get the Total Height. And all your data from the file will be in the variable data
.
Upvotes: 3
Reputation: 7574
This should work!
field = "Total Height"
# Get first 6 lines
with open(filename) as file:
lines = [next(file) for x in range(6)]
value = None
for line in lines:
if line.startswith(field):
# Get the part of the string after the field name
end_of_string = line[len(field):]
# Convert it to an int:
value = int(end_of_string.strip())
print(value) #Should print 61
If you know that the field names and values are separated by a tab character instead of spaces, you could instead use line.split('\t')
to break each line into the field name and field value, and then just check if field_name is the field you care about, and if so, use the value, instead of using startswith
and then slicing the resulting string to get the end of it.
Upvotes: 1