Foad S. Farimani
Foad S. Farimani

Reputation: 14016

Convert Arabic numbers in a Pandas column to integers

I have been given a survey for analysis. Unfortunately, some of the participants have used Arabic/Farsi numbers to fill some values. For example:

import pandas as pd

pd.DataFrame(["24", "۱۲", "45", "۳۲"], columns=["age"])

and what I want is to convert all the values to Python integers:

[24, 12, 45, 32]

What is the most canonical/performant way to do this

Upvotes: 2

Views: 422

Answers (2)

AXO
AXO

Reputation: 9096

You can apply Python's built-in int function which does understand Arabic numerals:

>>> from pandas import DataFrame
>>> df = DataFrame(["24", "۱۲", "45", "۳۲"], columns=["age"])
>>> df['age'] = df['age'].apply(int)
>>> df['age']
0    24
1    12
2    45
3    32
Name: age, dtype: int64

Actually, numpy/pandas dtypes are also aware of Unicode numerals. So usual type castings also work:

>>> import pandas as pd
>>> pd.Series(["24", "۱۲", "45", "۳۲"], dtype='float64')
0    24.0
1    12.0
2    45.0
3    32.0
dtype: float64
>>> pd.Series(["24", "۱۲", "45", "۳۲"]).astype('int64')
0    24
1    12
2    45
3    32
dtype: int64
>>> import numpy as np
>>> np.array(["24", "۱۲", "45", "۳۲"], dtype='int64')
array([24, 12, 45, 32], dtype=int64)
>> np.array(["24", "۱۲", "45", "۳۲"]).astype('float16')
array([24., 12., 45., 32.], dtype=float16)

Upvotes: 2

rafaelc
rafaelc

Reputation: 59274

Apply unidecode first through your numbers, and then convert using pd.to_numeric

pip install unidecode
from unidecode import unidecode

df['numbers'] = pd.to_numeric(df.age.apply(unidecode), errors='coerce')

  age  numbers
0  24       24
1  ۱۲       12
2  45       45
3  ۳۲       32

Upvotes: 3

Related Questions