Reputation: 8277
I'm trying to read files in Python that have all the same structure but quite an original one: it doesn't seem trivial to read it with the default list
and parsing tools, though I'm sure it is possible.
So the structure is: int
space int
space double
space a long long string that contains spaces
I need to store the two int
s and the float
(file header) apart, then I'd like to have the whole string as a monolithic bloc, because my data is encoded at the bit level in each one of the characters (I hope I'm explaining it clearly...).
Using naively the .split()
method doesn't help me because of the spaces in the string, I've been thinking about "meshing" all the elements in split()
after the first three, but I'd lose information if there were double spaces in the string.
In C++, I'd be using <<
for the ints and double, then .getbyte()
for the characters, are there equivalents in Python?
Upvotes: 1
Views: 87
Reputation: 6332
You can still use the .split() function. Since you know the format of the lines you can pass in the number of lines to be made.
str.split(str="", num)
Parameters
str -- This is any delimeter, by default it is space.
num -- this is number of splits to be made.
So in your case you should be able to do
str.split(str='', 3)
Which should split up into:
Upvotes: 4
Reputation: 131
So the format for each line looks like this (I'm assuming that the string isn't separately escaped by quotes):
"4 5 8.7 here is a really long string"
In general, for more sophisticated parsing, it's recommended that you use regular expressions.
import re
[...]
for line in file:
#let's say line is "4 5 8.7 here is a really long string"
pat = r'([0-9]+)\s([0-9]+)\s([0-9\.]+)\s([\w\s\_\-]+)'
match = re.search(pat, line)
matches_by_group = match.groups() #Do something with this
This way you'll have each separate piece in a tuple for each line. You can then cast the double, int, etc. as necessary.
Upvotes: 1