Reputation: 78
I am facing an issue using the read_fwf command from the Python library pandas, same as described in this unresolved question
I want to read an ascii file containing results of numeric computations. The file contains repetitive header blocks with a variable number of lines between each of them and with a roughly described format but rather arbitrary string content. I do not want to allow any loss of information and therefore need to be able to process everything as it was originally.
Look at this examplary line:
$SUBTITLE= 48-ELEMENT CANTILEVER BEAM ON THE X-AXIS 1181
Using a
with pd.read_fwf(self.file_path, header=None,
chunksize=chunk_size, # no of lines read
colspecs=[(0, 23), (23, 41), (41, 59), (59, 72)],
engine='c', dtype={0: str}) as df_gen:
context manager to read the file, white spaces between columns will be stripped despite the seamlessly neighbouring column lengths. So it does not act as a fixed-width reader.
Cutting out the middle part of the line above with
"".join(s.dropna()).split("=")[-1].strip(" ")
with s being the row of the DataFrame containing that line, I will get a string
48-ELEMENT CANTILEVER BEAM ONTHE X-AXIS
instead of
48-ELEMENT CANTILEVER BEAM ON THE X-AXIS
.
Does anybody know an elegant solution to this issue? I am working on an approach where I will initially not cut into columns and do it for the numeric blocks after i extracted the header lines but it might increase computation time significantly.
Is there maybe a keyword in newer pandas versions that will disable the stripping?
Upvotes: 2
Views: 2011
Reputation: 13242
Take bigger cuts:
s = "$SUBTITLE= 48-ELEMENT CANTILEVER BEAM ON THE X-AXIS 1181"
pd.read_fwf(StringIO(s), header=None, colspecs=[(0, 11),(11, 51),(51,80)]
Output:
0 1 2
0 $SUBTITLE= 48-ELEMENT CANTILEVER BEAM ON THE X-AXIS 1181
Or you can just read in line by line:
pd.read_table(StringIO(s), header=None)
...
0
0 $SUBTITLE= 48-ELEMENT CANTILEVER BEAM ON THE X-AXIS 1181
A little regex magic and you can consistently extract the inner line:
re.findall('.*= (.+?)\s\s', s)
...
['48-ELEMENT CANTILEVER BEAM ON THE X-AXIS']
Upvotes: 1