chiser
chiser

Reputation: 161

Importing a .dat file to a dataframe instead of a list of strings in python

I am trying to import a .dat file that is output from my experiments as metadata in the header lines and then the data of the experiment itself afterwards (after the line with dash lines). My idea was to strip it so that I have a list of strings variable containing the metadata and another variable as a dataframe with the results (the part below the dashes). I am having trouble trying to import the data below as data frame since the metadata above is classified as a list of strings and therefore the whole file stays in this format. Is there a way to get the data as a data frame and not as a list of strings?

Learned-Helplesness-Experiment  (TriplePlatform)  from      05.04.2017         13:41:24

software version:   DoublePlatform_1.3 04-Jun-2014

Setup of Experiment:    

Platform 1: 
ExpType:    M   M   M   M   M   M   M   M   M   M   

heated side:    right   right   right   right   right   right   right       right   right   right   

PIs:     n. def.     0   0   0   0   0   0   0   0   0  

Platform 2: 
ExpType:    Te  Te  Te  Y   Te  Y   Y   Y   Y   Y   

heated side:    right   right   right   ->M right   ->M ->M ->M ->M ->M 

PIs:     n. def.     0   0   0   0   0   0   0   0   0  

Platform 3: 
ExpType:    Y   Y   Y   Y   M_S Y   Y   Y   Y   Y   

heated side:    ->M ->M ->M ->M right   ->M ->M ->M ->M ->M 

PIs:     n. def.     0   0   0   0   0   0   0   0   0  


------------------------------------    ------------------------------------

 0   0   0   0   0
 1   47 -0.3759766   0.1123047   0.3710938
 2   97  0.01953125 -0.1318359   0.1123047
 3   157    -0.4150391   0.2246094   0.3369141
 4   207    -0.01953125 -0.2539063   0.1318359
 5   257    -0.3515625   0.3027344   0.3222656

Upvotes: 0

Views: 334

Answers (1)

akoeltringer
akoeltringer

Reputation: 1721

I guess you are using pandas? I think there is no "general" way of doing this. You could open/parse the file manually (until the "dash lines"). The part until the dash line you keep as "list of strings". Then you tell pandas to import the rest starting with line number x (where you found the dashes). The option is called skiprows.

Edit1 (in response to the comment):

That depends on whether your header has a constant number of rows. If not, you might want to read through the file line by line, looking for the dashes:

with open('filename', 'r') as file:
    line_no = 0
    for line in file.read():
        line_no += 1
        if line.startswith('-'*37):
            # do sth
            break
        else:
            # do sth

Edit2

To import the data part, you could use

pandas.read_csv(..., sep='\t', skiprows=line_no)

in case tab is the field delimiter, or

pandas.read_csv(..., delim_whitespace=True, skiprows=line_no)

if the fields are delimited by one (or more) blanks

Upvotes: 1

Related Questions