Reputation: 3375
I have a very large column of phone numbers in a pandas dataframe, and they're in float format: 3.52831E+11
. There are also NaNs present.
I am trying to convert the numbers to int and it's throwing an error that NaNs can't be converted to int. Fair enough. But I can't seem to get around this.
Here's a sample:
df = pd.DataFrame({'number':['3.578724e+11','3.568376e+11','3.538884e+11',np.NaN]})
number
0 3.578724e+11
1 3.568376e+11
2 3.538884e+11
3 NaN
# My first attempt: here's where I try to convert them to int() however I get 'cannot convert float NaN to integer'.
df['number'] = [int(x) for x in df['number'] if isinstance(x, float)]
# I have also tried the below, but I get SyntaxError: invalid syntax.
df['number'] = [int(x) for x in df['number'] if x not None]
# and then this one, but the error is: TypeError: must be real number, not str
df['number'] = [int(x) for x in df['number'] if not math.isnan(x) and isinstance(x, float)]
I'd appreciate some pointers on this. I thought at least one of these would work.
Thanks folks
Upvotes: 1
Views: 429
Reputation: 402493
From pandas 0.24+, we have the Nullable Integer Type. The first step is to convert your strings (objects) to float, then to nullable int:
df.astype('float').astype(pd.Int64Dtype())
number
0 357872400000
1 356837600000
2 353888400000
3 NaN
As a shorthand, you may also do,
df.astype('float').astype('Int64')
number
0 357872400000
1 356837600000
2 353888400000
3 NaN
On older versions, your only option will be to drop NaNs and convert:
df.dropna(subset=['number']).astype({'number':float}).astype({'number':int})
number
0 357872400000
1 356837600000
2 353888400000
Upvotes: 1