Rishik Mani
Rishik Mani

Reputation: 498

Unable to read the first row of a .dat file using pandas

I have a .dat file, about whose origin I am not sure. I have to read this file in order to perform PCA. Assuming it to be white spaced file, I was successfully able to read the contents of file and ignore the first column (as it is a index), but the very first row. Below is the code:

import numpy as np
import pandas as pd
from numpy import array

myarray = pd.read_csv('hand_postures.dat', delim_whitespace=True)
myarray = array(myarray)
print(myarray.shape)
myarray = np.delete(myarray,0,1)
print(myarray)
print(myarray.shape)

The file is shared at the link https://drive.google.com/open?id=0ByLV3kGjFP_zekN1U1c3OGFrUnM. Can someone help me point out my mistake?

Upvotes: 0

Views: 1034

Answers (1)

cs95
cs95

Reputation: 402483

You need an extra parameter when calling pd.read_csv.

df = pd.read_csv('hand_postures.dat', header=None, delim_whitespace=True, index_col=[0])

df.head()

         1          2        3        4         5        6        7        8   \
0                                                                               
0 -65.55560   0.172413  44.4944  22.2472  0.000000  50.6723  34.3434  17.1717   
1 -65.55560   2.586210  43.8202  21.9101  0.277778  51.4286  34.3434  17.1717   
2 -45.55560   5.000000  43.8202  21.9101  0.833333  56.7227  42.4242  21.2121   
3   5.55556  -2.241380  46.5169  23.2584  1.111110  70.3361  85.8586  42.9293   
4  67.77780  20.689700  59.3258  29.6629  2.222220  80.9244  93.9394  46.9697   

         9        10       11       12        13       14        15       16  \
0                                                                              
0 -0.235294  54.6154  39.7849  19.8925  0.705883  37.2656   41.3043  20.6522   
1 -0.235294  55.3846  38.7097  19.3548  0.705883  38.6719   41.3043  20.6522   
2  0.000000  63.0769  47.3118  23.6559  0.000000  47.8125   54.3478  27.1739   
3 -0.117647  83.8462  90.3226  45.1613  0.352941  73.1250   92.3913  46.1957   
4  0.117647  93.8462  98.9247  49.4624 -0.352941  89.2969  100.0000  50.0000   

     17       18        19       20  
0                                    
0  15.0  34.6584   54.1270  27.0635  
1  14.4  35.2174   55.8730  27.9365  
2  14.4  43.6025   69.8413  34.9206  
3   3.6  73.7888   94.2857  47.1429  
4  -1.2  92.2360  106.5080  53.2540  
  • header=None specifies that the first row is part of the data (and not the header)
  • index_col=[0] specifies that the first column is to be treated as the index

Upvotes: 1

Related Questions