Learning is a mess
Learning is a mess

Reputation: 8277

Reading single line file in python not skipping some spaces

I'm trying to read files in Python that have all the same structure but quite an original one: it doesn't seem trivial to read it with the default list and parsing tools, though I'm sure it is possible. So the structure is: int space int space double space a long long string that contains spaces

I need to store the two ints and the float (file header) apart, then I'd like to have the whole string as a monolithic bloc, because my data is encoded at the bit level in each one of the characters (I hope I'm explaining it clearly...).

Using naively the .split() method doesn't help me because of the spaces in the string, I've been thinking about "meshing" all the elements in split() after the first three, but I'd lose information if there were double spaces in the string.

In C++, I'd be using << for the ints and double, then .getbyte() for the characters, are there equivalents in Python?

Upvotes: 1

Views: 87

Answers (2)

Craicerjack
Craicerjack

Reputation: 6332

You can still use the .split() function. Since you know the format of the lines you can pass in the number of lines to be made.

str.split(str="", num)

Parameters
str -- This is any delimeter, by default it is space.
num -- this is number of splits to be made.
So in your case you should be able to do

str.split(str='', 3)

Which should split up into:

  • int
  • int
  • double
  • string

Upvotes: 4

Alfalfa
Alfalfa

Reputation: 131

So the format for each line looks like this (I'm assuming that the string isn't separately escaped by quotes):

"4 5 8.7 here is a really long string"

In general, for more sophisticated parsing, it's recommended that you use regular expressions.

import re
[...]

for line in file:
    #let's say line is "4 5 8.7 here is a really long string"
    pat = r'([0-9]+)\s([0-9]+)\s([0-9\.]+)\s([\w\s\_\-]+)'
    match = re.search(pat, line)
    matches_by_group = match.groups() #Do something with this

This way you'll have each separate piece in a tuple for each line. You can then cast the double, int, etc. as necessary.

Upvotes: 1

Related Questions