Reputation: 133
While trying to import data from a .txt file using pandas read_table()
, I came across an interesting issue.
The file contains a column only partly filled with strings (OrigVol). The empty cells are then filled with values from the following column and NaNs are added at the end, where this shift leads to empty cells. The image below shows the result of using pd.read_table( file.txt, sep = r'\s+')
after import pandas as pd
.
The column OrigVol should contain empty strings, or NaN's, instead of the column prim_pos_eutags. I'm new to panda and it's diffcult to come up with a MWE that exactly reproduces the problem. One would need the file (which I couldn't upload).
Maybe I missed to specify an import option. Let me know if more information is required.
Any hints are much appreciated!
According to the suggestion, I provide the first 8 lines of an example file
Name Subtype ProcName Material Creator OrigVol trklen prim_pos_eutags
top 0 initStep undefined undefined top 0 prim_pos_eudata
top 91 Transportation Vacuum undefined 2.27009877562523E-12 prim_pos_eudata
QC5L_2_v 91 Transportation Vacuum undefined 3.50000000000227 prim_pos_eudata
DRIFT_8609_v 91 Transportation Vacuum undefined 3.80000000000518 prim_pos_eudata
BC1L_2_v 23 SynRad Vacuum undefined 68.1607816456518 prim_pos_eudata
BC1L_2_v 91 Transportation Vacuum undefined 79.0910350747966 prim_pos_eudata
DRIFT_8610_v 91 Transportation Vacuum undefined QC2_v 79.3910346856657 prim_pos_eudata
Upvotes: 0
Views: 171
Reputation: 133
Thanks to the suggestions by Anant Mittal, the problem could be solved.
However, I did some minor changes to achieve a clean data frame with the first line as header.
input_file = open('test.txt', "r")
data = []
for line in input_file:
splitted_list = line.split()
if len(splitted_list) == 8: data.append( splitted_list )
if len(splitted_list) < 8: # 8 is number of columns, change it if it is wrong
splitted_list.insert(5, "") # add an empty string inn the 5th column (change the number if it is wrong)
data.append(splitted_list)
test = pd.DataFrame.from_records( data )
# first row to select header
header = test.iloc[0]
# exclude first line
test = test[1:]
# reset the header
test.columns = header
test
Upvotes: 1
Reputation: 2113
The columns are separated with variable number of spaces to make the columns look nice and aligned in the text file. What you can do instead is to read the file line by line, split the line and see if the number of elements in the list generated with line.split()
are less than number of columns. If yes, than add an empty string at the required index.
input_file = open('test.txt', "rb")
data = []
for line in input_file:
splitted_list = line.split()
if len(splitted_list) < 8: # 8 is number of columns, change it if it is wrong
splitted_list.insert(5, "") # add an empty string inn the 5th column (change the number if it is wrong)
data.append(splitted_list)
df = pd.DataFrame.from_records(data)
Upvotes: 0