Reputation: 5126
I have a task to take the first 6 digits of a column in pandas. However, if this number is less than 6 digits long it adds a decimal to the end of the number. Unfortunately, this is not acceptable for my needs later down the road.
I'm sure I can get rid of the decimal with various code, but It will probably be inefficient as DataFrames get larger.
Current code:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN],
'B' : [1,0,3,5,0,0,np.NaN,9,0,0],
'C' : [10,0,30,50,0,0,4,10,1,0],
'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
wow2 = df1
wow2['D'] = wow2['D'][:6]
print(wow2)
A B C D E
0 NaN 1.0 10 123456 Assign
1 NaN 0.0 0 123456 Unassign
2 3.0 3.0 30 123456 Assign
3 4.0 5.0 50 123456 Ugly
4 5.0 0.0 0 12345. Appreciate <--- Notice Decimal
5 5.0 0.0 0 12345. Undo <--- Notice Decimal
6 3.0 NaN 4 NaN Assign
7 1.0 9.0 10 NaN Unicycle
8 5.0 0.0 1 NaN Assign
9 NaN 0.0 0 NaN Unicorn
Is there a way I can leave the digit if it's length is not over 6? I thought about converting the column to string and doing a loop... But I believe that would be wildly inefficient and create more problems than solutions
Upvotes: 2
Views: 6442
Reputation: 6663
To get the first 6 digits of a number (without converting to string and back), you may use the modulo operator.
In order to represent your numeric values as non floating point numbers you need to convert them into integers. However, mixing integers and np.NaN within the same column will result into float64
(see here for more). To get around this (which is kind of ugly) you need to convert the integers into strings which forces the dtype to be object
because you mix strings and float values.
The solution looks like the following:
wow2['D'] = wow2['D'].mod(10**6)\
.dropna()\
.astype(int)\
.astype(str)
print(wow['D'])
0 123456
1 123456
2 234567
3 345678
4 12345
5 12345
6 345678
7 456789
8 234567
9 NaN
Name: D, dtype: object
Upvotes: 3