Rishi Deorukhkar
Rishi Deorukhkar

Reputation: 189

Cannot convert pandas column from string to int

The below column in data frame needs to be converted to int:

dsAttendEnroll.District.head()

0    DISTRICT 01
1    DISTRICT 02
2    DISTRICT 03
3    DISTRICT 04
4    DISTRICT 05
Name: District, dtype: object

Using astype gives the below error, how can this be done ?

dsAttendEnroll.District = dsAttendEnroll.District.map(lambda x: x[-2:]).astype(int)

ValueError: invalid literal for long() with base 10: 'LS'

Upvotes: 2

Views: 2269

Answers (2)

jezrael
jezrael

Reputation: 863431

You can use split with selecting second lists by str[1] with to_numeric, where is parameter errors='coerce' - it convert not numeric values to NaN:

print (df)
      District
0  DISTRICT 01
1  DISTRICT 02
2  DISTRICT 03
3  DISTRICT 04
4  DISTRICT 05
5  DISTRICT LS

print (df.District.str.split().str[1])
0    01
1    02
2    03
3    04
4    05
5    LS
Name: District, dtype: object

print (pd.to_numeric(df.District.str.split().str[1], errors='coerce'))
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    NaN
Name: District, dtype: float64

Another solution with slice 2 last chars:

print (df.District.str[-2:])
0    01
1    02
2    03
3    04
4    05
5    LS
Name: District, dtype: object

print (pd.to_numeric(df.District.str[-2:], errors='coerce'))
0    1.0
1    2.0
2    3.0
3    4.0
4    5.0
5    NaN
Name: District, dtype: float64

Upvotes: 3

ℕʘʘḆḽḘ
ℕʘʘḆḽḘ

Reputation: 19395

You can try:

dsAttendEnroll.District=pd.to_numeric(dsAttendEnroll.District)
dsAttendEnroll.District=dsAttendEnroll.District.astype(int)

Have a look at the documentation here.

Upvotes: 3

Related Questions