Reputation: 146
I'm having difficulties getting pandas to read a .txt-file whose first column contains sections of data, leaving some of its rows blank. The raw data is separated by multiple whitespaces of different lengths, and has format:
Sensor Value Name Case
sec01 379 S01 H4
sec02 1183 S02 A1
948 S03 A1
6948 S04 A1
2 S05 C1
sec03 -722 S06 E1
-466 S07 B1
-289 S08 F1
-105 S09 F1
Where the sensor column is filled with whitespace for indentation where there is no "sensor" cell. Passing:
df = pd.read_csv(filename,
sep = '\s+',
header = None,
skipinitialspace = False,
encoding = 'unicode_escape',
engine = 'python'
)
, results in output of the form:
Sensor Value Name Case
sec01 379 S01 H4
sec02 1183 S02 A1
948 S03 A1 None
6948 S04 A1 None
2 S05 C1 None
sec03 -722 S06 E1
-466 S07 B1 None
-289 S08 F1 None
-105 S09 F1 None
Where columns get mixed up. The desired output is to have None
or NaN
for missing "sensor" cells.
What would be the correct separator/delimiter input in this case? Or does .txt-file need to be rearranged beforehand?
Upvotes: 0
Views: 366
Reputation: 352
Your data is fixed-width so you can use the pd.read_fwf
function to read in the data.
df = pd.read_fwf(filename)
Pretty useful read in method if you find yourself with well-structured data- Docs
Upvotes: 1
Reputation: 8279
You can use pandas read_fwf
to read the data as fixed-width formatted lines and then forward-fill the null values in Sensor
:
import pandas as pd
from io import StringIO
data = """Sensor Value Name Case
sec01 379 S01 H4
sec02 1183 S02 A1
948 S03 A1
6948 S04 A1
2 S05 C1
sec03 -722 S06 E1
-466 S07 B1
-289 S08 F1
-105 S09 F1 """
df = pd.read_fwf(StringIO(data))
df["Sensor"] = df.Sensor.fillna(method="ffill")
df
Result:
Sensor Value Name Case
0 sec01 379 S01 H4
1 sec02 1183 S02 A1
2 sec02 948 S03 A1
3 sec02 6948 S04 A1
4 sec02 2 S05 C1
5 sec03 -722 S06 E1
6 sec03 -466 S07 B1
7 sec03 -289 S08 F1
8 sec03 -105 S09 F1
Upvotes: 3