Jan
Jan

Reputation: 1479

Pandas dataframe does not separate columns according to comma in csv

I want to create a matrix according to the table from CSV data

COEFFICIENT MATRIX 
,0,1,2,3,4
0,0.00876623398408,0.525189723661,0.528495953628,0.94228622319,0.0379073884588
1,0.434693398364,0.77017930965,0.00847865052462,0.544319471939,0.858970329817
2,0.978091233581,0.900800004769,0.504567295427,0.65499490009,0.397203736755
3,0.671510258373,0.554713361673,0.377098128478,0.246977226206,0.535900353082
...
5000,0.791781572037,0.70262685963,0.218775600741,0.19802280762,0.68177855465

I'm using pandas for reading csv and return a matrix. Instead of getting matrix.shape = 5001*5, I got 5002*1.

How to make pandas dataframe separate the right number of columns according to comma from CSV, and don't count the header (after the table title) as the first row?

 input = pd.read_csv(coeff_file, skiprows=0)
 input_mat = input.as_matrix()

 print input.shape
 print type(input)

 print input_mat.shape
 print type(input_mat)

return

(5002, 1)
<class 'pandas.core.frame.DataFrame'>
(5002, 1)
<type 'numpy.ndarray'>

Upvotes: 1

Views: 6116

Answers (1)

jezrael
jezrael

Reputation: 862511

I think you need skiprows=1, skiprows=[0] or header=1 parameters in read_csv:

df = pd.read_csv(coeff_file, skiprows=1, index_col=0)
print (df)
             0         1         2         3         4
0     0.008766  0.525190  0.528496  0.942286  0.037907
1     0.434693  0.770179  0.008479  0.544319  0.858970
2     0.978091  0.900800  0.504567  0.654995  0.397204
3     0.671510  0.554713  0.377098  0.246977  0.535900
5000  0.791782  0.702627  0.218776  0.198023  0.681779

df = pd.read_csv(coeff_file, header=1, index_col=0)
print (df)
             0         1         2         3         4
0     0.008766  0.525190  0.528496  0.942286  0.037907
1     0.434693  0.770179  0.008479  0.544319  0.858970
2     0.978091  0.900800  0.504567  0.654995  0.397204
3     0.671510  0.554713  0.377098  0.246977  0.535900
5000  0.791782  0.702627  0.218776  0.198023  0.681779

df = pd.read_csv(StringIO(temp), skiprows=[0], index_col=0)
print (df)
             0         1         2         3         4
0     0.008766  0.525190  0.528496  0.942286  0.037907
1     0.434693  0.770179  0.008479  0.544319  0.858970
2     0.978091  0.900800  0.504567  0.654995  0.397204
3     0.671510  0.554713  0.377098  0.246977  0.535900
5000  0.791782  0.702627  0.218776  0.198023  0.681779

Upvotes: 1

Related Questions