MattR
MattR

Reputation: 5126

Take first 6 digits of Pandas Column

I have a task to take the first 6 digits of a column in pandas. However, if this number is less than 6 digits long it adds a decimal to the end of the number. Unfortunately, this is not acceptable for my needs later down the road.

I'm sure I can get rid of the decimal with various code, but It will probably be inefficient as DataFrames get larger.

Current code:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN], 
                    'B' : [1,0,3,5,0,0,np.NaN,9,0,0], 
                    'C' : [10,0,30,50,0,0,4,10,1,0], 
                    'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
                    'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})

wow2 = df1
wow2['D'] = wow2['D'][:6]
print(wow2)

     A    B   C       D           E
0  NaN  1.0  10  123456      Assign
1  NaN  0.0   0  123456    Unassign
2  3.0  3.0  30  123456      Assign
3  4.0  5.0  50  123456        Ugly
4  5.0  0.0   0  12345.  Appreciate <--- Notice Decimal
5  5.0  0.0   0  12345.        Undo <--- Notice Decimal
6  3.0  NaN   4     NaN      Assign
7  1.0  9.0  10     NaN    Unicycle
8  5.0  0.0   1     NaN      Assign
9  NaN  0.0   0     NaN     Unicorn

Is there a way I can leave the digit if it's length is not over 6? I thought about converting the column to string and doing a loop... But I believe that would be wildly inefficient and create more problems than solutions

Upvotes: 2

Views: 6442

Answers (1)

pansen
pansen

Reputation: 6663

To get the first 6 digits of a number (without converting to string and back), you may use the modulo operator. In order to represent your numeric values as non floating point numbers you need to convert them into integers. However, mixing integers and np.NaN within the same column will result into float64 (see here for more). To get around this (which is kind of ugly) you need to convert the integers into strings which forces the dtype to be object because you mix strings and float values.

The solution looks like the following:

wow2['D'] = wow2['D'].mod(10**6)\
   .dropna()\
   .astype(int)\
   .astype(str)

print(wow['D'])

0    123456
1    123456
2    234567
3    345678
4     12345
5     12345
6    345678
7    456789
8    234567
9       NaN
Name: D, dtype: object

Upvotes: 3

Related Questions