Reputation: 335
I have a dataset in a textfile that looks like this.
0 0CF00400 X 8 66 7D 91 6E 22 03 0F 7D 0.021650 R
0 18EA0080 X 3 E9 FE 00 0.022550 R
0 00000003 X 8 D5 64 22 E1 FF FF FF F0 0.023120 R
I read this using
file_pandas = pd.read_csv(fileName, delim_whitespace = True, header = None, engine = 'python')
And got the output
0 0 0CF00400 X 8 66 7D 91 6E 22 03 0F 7D 0.02165
1 0 18EA0080 X 3 E9 FE 0 0.022550 R None None None NaN
2 0 00000003 X 8 D5 64 22 E1 FF FF FF F0 0.02312
But I want this read as
0 0 0CF00400 X 8 66 7D 91 6E 22 03 0F 7D 0.021650 R
1 0 18EA0080 X 3 E9 FE 00 0.022550 R
2 0 00000003 X 8 D5 64 22 E1 FF FF FF F0 0.023120 R
I've tried removing delim_whitespace = True
and replacing it with delimiter = " "
but that just combined the first four columns in the output shown above, but it did parse the rest of the data correctly, meaning that the rest of the columns were like the origin txt file (barring the NaN values in whitespaces).
I'm not sure how to proceed from here.
Side note: the 00
is being parsed as only 0
. Is there a way to display 00
instead?
Upvotes: 5
Views: 1170
Reputation: 215047
It seems like your data is fixed width columns, you can try pandas.read_fwf()
:
from io import StringIO
import pandas as pd
df = pd.read_fwf(StringIO("""0 0CF00400 X 8 66 7D 91 6E 22 03 0F 7D 0.021650 R
0 18EA0080 X 3 E9 FE 00 0.022550 R
0 00000003 X 8 D5 64 22 E1 FF FF FF F0 0.023120 R"""),
header = None, widths = [1,12,2,8,4,4,4,4,4,4,4,4,16,2])
Upvotes: 8