Simd
Simd

Reputation: 21353

Replace string by numerical value

I am reading in data with

df = pandas.read_csv("file.csv", names=['A','B','C','D','E','F','G', 'H','I','J', 'K'], header=None)

I get

df.dtypes
Out[54]: 
A     int64
B    object
C     int64
D     int64
E    object
F    object
G    object
H    object
I    object
J    object
K    object
dtype: object

The problem is that some of the fields in the original data have been replaced with the string SUPP when they are less than 6 (but more than 0) so I am not getting numerical data types. I tried replacing them with

df.replace('SUPP', 3.0)

but I still don't get numerical data types.

Some typical input data looks like

931,Oxfordshire,9314125,123255,Larkmead School,Abingdon,125,124,20,SUPP,8
931,Oxfordshire,9314126,123256,John Mason School,Abingdon,164,164,25,6,16
931,Oxfordshire,9314127,123257,Fitzharrys School,Abingdon,150,149,9,0,11
931,Oxfordshire,9316076,123298,Our Lady's Abingdon,Abingdon,57,57,SUPP,SUPP,16

The problem can be reproduced by just saving the example above as file.csv.

Upvotes: 0

Views: 210

Answers (1)

TomAugspurger
TomAugspurger

Reputation: 28956

EdChum almost had it in the comments.

In [18]: df.dtypes
Out[18]: 
0      int64
1     object
2      int64
3      int64
4     object
5     object
6      int64
7      int64
8     object
9     object
10     int64
dtype: object

In [19]: df.replace('SUPP', 3, inplace=True)

In [20]: df.dtypes
Out[20]: 
0      int64
1     object
2      int64
3      int64
4     object
5     object
6      int64
7      int64
8     object
9     object
10     int64
dtype: object

In [21]: df = df.convert_objects(convert_numeric=True)

In [22]: df.dtypes
Out[22]: 
0      int64
1     object
2      int64
3      int64
4     object
5     object
6      int64
7      int64
8      int64
9      int64
10     int64
dtype: object

You need to convert_objects because even though you've replaced SUPP with 3, the other values in that column are still strings (object dtype).

Upvotes: 2

Related Questions