Reputation: 1309
I want to read a dat
file into pandas
(I'm using read_table()
), in the file each row of observation is divided into multiple rows like this:
var1 var2 var3
var4 var5 var6
1 2 3
4 5 6
1 2 3
4 5 6
1 2 3
4 5 6
in this case, each observation is divided into two rows.
Upvotes: 3
Views: 299
Reputation: 863146
You can use parameter skiprows
and header
in function read_csv
:
import pandas as pd
import io
temp=u"""var1 var2 var3
var4 var5 var6
1 2 3
4 5 6
1 2 3
4 5 6
1 2 3
4 5 6"""
print range(0, 10)[1::2]
[1, 3, 5, 7, 9]
#after testing replace io.StringIO(temp) to filename
df1 = pd.read_csv(io.StringIO(temp),
sep='\s+',
header=0,
index_col=None,
skiprows=(range(0, 10)[1::2]))
print df1
var1 var2 var3
0 1 2 3
1 1 2 3
2 1 2 3
print range(2, 10)[::2]
[2, 4, 6, 8]
#after testing replace io.StringIO(temp) to filename
df2 = pd.read_csv(io.StringIO(temp),
sep='\s+',
header=1,
index_col=None,
skiprows=(range(2, 10)[::2]))
print df2
var4 var5 var6
0 4 5 6
1 4 5 6
2 4 5 6
Or you can use postprocessing (as Paul
suggested) with header=[0,1]
in read_csv
and function droplevel
. Last you can reset_index
:
import pandas as pd
import io
temp=u"""var1 var2 var3
var4 var5 var6
1 2 3
4 5 6
1 2 3
4 5 6
1 2 3
4 5 6"""
df = pd.read_csv(io.StringIO(temp),
sep='\s+',
header=[0,1])
print df
var1 var2 var3
var4 var5 var6
0 1 2 3
1 4 5 6
2 1 2 3
3 4 5 6
4 1 2 3
5 4 5 6
df1 = df[::2]
df1.columns=df.columns.droplevel(1)
print df1.reset_index(drop=True)
var1 var2 var3
0 1 2 3
1 1 2 3
2 1 2 3
df1 = df[1::2]
df1.columns=df.columns.droplevel(0)
print df1.reset_index(drop=True)
var4 var5 var6
0 4 5 6
1 4 5 6
2 4 5 6
Upvotes: 1