Kenenbek Arzymatov
Kenenbek Arzymatov

Reputation: 9109

Can't read csv data correctly

I have this code, where wine.csv (I added the names of columns) is data from here:

import pandas

data = pandas.read_csv("wine.csv")
df = pandas.DataFrame(data, columns=['W', 'E', 'R', 'T', 'Y', 'U', 'I', 'O', 'P', 'A', 'S', 'D', 'F'])
print df

But it gives only NaNs.

Do you have any advice on how to solve this problem?

Upvotes: 0

Views: 301

Answers (2)

jezrael
jezrael

Reputation: 862406

I think you can use read_csv with url address and parameter names for columns names:

import pandas

df = pandas.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", names=['W', 'E', 'R', 'T', 'Y', 'U', 'I', 'O', 'P', 'A', 'S', 'D', 'F'])
print df.head()
       W     E     R     T    Y     U     I     O     P     A     S     D  \
1  14.23  1.71  2.43  15.6  127  2.80  3.06  0.28  2.29  5.64  1.04  3.92   
1  13.20  1.78  2.14  11.2  100  2.65  2.76  0.26  1.28  4.38  1.05  3.40   
1  13.16  2.36  2.67  18.6  101  2.80  3.24  0.30  2.81  5.68  1.03  3.17   
1  14.37  1.95  2.50  16.8  113  3.85  3.49  0.24  2.18  7.80  0.86  3.45   
1  13.24  2.59  2.87  21.0  118  2.80  2.69  0.39  1.82  4.32  1.04  2.93   

      F  
1  1065  
1  1050  
1  1185  
1  1480  
1   735 

If you need columns from E to D, use parameter usecols as filter of columns:

import pandas

df = pandas.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", 
                     header=0,
                     index_col=0,
                     names=['IDX', 'W', 'E', 'R', 'T', 'Y', 'U', 'I', 'O', 'P', 'A', 'S', 'D', 'F'],
                     usecols=['IDX', 'E', 'R', 'T', 'Y', 'U', 'I', 'O', 'P', 'A', 'S', 'D', 'F'])
print df.head()
        E     R     T    Y     U     I     O     P     A     S     D     F
IDX                                                                       
1    1.78  2.14  11.2  100  2.65  2.76  0.26  1.28  4.38  1.05  3.40  1050
1    2.36  2.67  18.6  101  2.80  3.24  0.30  2.81  5.68  1.03  3.17  1185
1    1.95  2.50  16.8  113  3.85  3.49  0.24  2.18  7.80  0.86  3.45  1480
1    2.59  2.87  21.0  118  2.80  2.69  0.39  1.82  4.32  1.04  2.93   735
1    1.76  2.45  15.2  112  3.27  3.39  0.34  1.97  6.75  1.05  2.85  1450

EDIT:

If you dont want use first column as index, use index_col=None and remove index column IDX from list of columns in usecols:

import pandas

df = pandas.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", 
                     header=0,
                     index_col=None,
                     names=['IDX', 'W', 'E', 'R', 'T', 'Y', 'U', 'I', 'O', 'P', 'A', 'S', 'D', 'F'],
                     usecols=[ 'E', 'R', 'T', 'Y', 'U', 'I', 'O', 'P', 'A', 'S', 'D', 'F'])

print df.head()
      E     R     T    Y     U     I     O     P     A     S     D     F
0  1.78  2.14  11.2  100  2.65  2.76  0.26  1.28  4.38  1.05  3.40  1050
1  2.36  2.67  18.6  101  2.80  3.24  0.30  2.81  5.68  1.03  3.17  1185
2  1.95  2.50  16.8  113  3.85  3.49  0.24  2.18  7.80  0.86  3.45  1480
3  2.59  2.87  21.0  118  2.80  2.69  0.39  1.82  4.32  1.04  2.93   735
4  1.76  2.45  15.2  112  3.27  3.39  0.34  1.97  6.75  1.05  2.85  1450

Upvotes: 3

Richard Rublev
Richard Rublev

Reputation: 8164

I think you are looking for this

import pandas

df = pandas.read_csv("https://archive.ics.uci.edu/ml/machine-learning-databases/wine/wine.data", names=['W', 'E', 'R', 'T', 'Y', 'U', 'I', 'O', 'P', 'A', 'S', 'D', 'F'])

s=df[['E', 'R', 'T', 'Y', 'U', 'I', 'O', 'P', 'A', 'S', 'D', 'F']]
print s

I got

       E     R     T    Y     U     I     O     P          A     S     D     F
1   1.71  2.43  15.6  127  2.80  3.06  0.28  2.29   5.640000  1.04  3.92  1065
1   1.78  2.14  11.2  100  2.65  2.76  0.26  1.28   4.380000  1.05  3.40  1050
1   2.36  2.67  18.6  101  2.80  3.24  0.30  2.81   5.680000  1.03  3.17  1185
1   1.95  2.50  16.8  113  3.85  3.49  0.24  2.18   7.800000  0.86  3.45  1480
1   2.59  2.87  21.0  118  2.80  2.69  0.39  1.82   4.320000  1.04  2.93   735
1   1.76  2.45  15.2  112  3.27  3.39  0.34  1.97   6.750000  1.05  2.85  1450
1   1.87  2.45  14.6   96  2.50  2.52  0.30  1.98   5.250000  1.02  3.58  1290
1   2.15  2.61  17.6  121  2.60  2.51  0.31  1.25   5.050000  1.06  3.58  1295
1   1.64  2.17  14.0   97  2.80  2.98  0.29  1.98   5.200000  1.08  2.85  1045

Upvotes: 1

Related Questions