Olivier Ma
Olivier Ma

Reputation: 1309

pandas: how to read file with divided rows

I want to read a dat file into pandas (I'm using read_table()), in the file each row of observation is divided into multiple rows like this:

var1 var2 var3
var4 var5 var6
1    2    3
4    5    6
1    2    3
4    5    6
1    2    3
4    5    6

in this case, each observation is divided into two rows.

Upvotes: 3

Views: 299

Answers (1)

jezrael
jezrael

Reputation: 863146

You can use parameter skiprows and header in function read_csv:

import pandas as pd
import io

temp=u"""var1 var2 var3
var4 var5 var6
1    2    3
4    5    6
1    2    3
4    5    6
1    2    3
4    5    6"""
print range(0, 10)[1::2]
[1, 3, 5, 7, 9]

#after testing replace io.StringIO(temp) to filename
df1 = pd.read_csv(io.StringIO(temp), 
                  sep='\s+', 
                  header=0, 
                  index_col=None, 
                  skiprows=(range(0, 10)[1::2]))
print df1
   var1  var2  var3
0     1     2     3
1     1     2     3
2     1     2     3

print range(2, 10)[::2]
[2, 4, 6, 8]

#after testing replace io.StringIO(temp) to filename
df2 = pd.read_csv(io.StringIO(temp), 
                  sep='\s+', 
                  header=1, 
                  index_col=None, 
                  skiprows=(range(2, 10)[::2]))
print df2
   var4  var5  var6
0     4     5     6
1     4     5     6
2     4     5     6

Or you can use postprocessing (as Paul suggested) with header=[0,1] in read_csv and function droplevel. Last you can reset_index:

import pandas as pd
import io

temp=u"""var1 var2 var3
var4 var5 var6
1    2    3
4    5    6
1    2    3
4    5    6
1    2    3
4    5    6"""

df = pd.read_csv(io.StringIO(temp), 
                  sep='\s+', 
                  header=[0,1])
print df      
  var1 var2 var3
  var4 var5 var6
0    1    2    3
1    4    5    6
2    1    2    3
3    4    5    6
4    1    2    3
5    4    5    6 
df1 = df[::2] 
df1.columns=df.columns.droplevel(1)
print df1.reset_index(drop=True) 
   var1  var2  var3
0     1     2     3
1     1     2     3
2     1     2     3


df1 = df[1::2] 
df1.columns=df.columns.droplevel(0)
print df1.reset_index(drop=True) 
   var4  var5  var6
0     4     5     6
1     4     5     6
2     4     5     6

Upvotes: 1

Related Questions