Reputation: 21353
I am reading in data with
df = pandas.read_csv("file.csv", names=['A','B','C','D','E','F','G', 'H','I','J', 'K'], header=None)
I get
df.dtypes
Out[54]:
A int64
B object
C int64
D int64
E object
F object
G object
H object
I object
J object
K object
dtype: object
The problem is that some of the fields in the original data have been replaced with the string SUPP when they are less than 6 (but more than 0) so I am not getting numerical data types. I tried replacing them with
df.replace('SUPP', 3.0)
but I still don't get numerical data types.
Some typical input data looks like
931,Oxfordshire,9314125,123255,Larkmead School,Abingdon,125,124,20,SUPP,8
931,Oxfordshire,9314126,123256,John Mason School,Abingdon,164,164,25,6,16
931,Oxfordshire,9314127,123257,Fitzharrys School,Abingdon,150,149,9,0,11
931,Oxfordshire,9316076,123298,Our Lady's Abingdon,Abingdon,57,57,SUPP,SUPP,16
The problem can be reproduced by just saving the example above as file.csv.
Upvotes: 0
Views: 210
Reputation: 28956
EdChum almost had it in the comments.
In [18]: df.dtypes
Out[18]:
0 int64
1 object
2 int64
3 int64
4 object
5 object
6 int64
7 int64
8 object
9 object
10 int64
dtype: object
In [19]: df.replace('SUPP', 3, inplace=True)
In [20]: df.dtypes
Out[20]:
0 int64
1 object
2 int64
3 int64
4 object
5 object
6 int64
7 int64
8 object
9 object
10 int64
dtype: object
In [21]: df = df.convert_objects(convert_numeric=True)
In [22]: df.dtypes
Out[22]:
0 int64
1 object
2 int64
3 int64
4 object
5 object
6 int64
7 int64
8 int64
9 int64
10 int64
dtype: object
You need to convert_objects
because even though you've replaced SUPP
with 3, the other values in that column are still strings (object dtype).
Upvotes: 2