SamR
SamR

Reputation: 77

How to extract the first 2 digits of all numbers in a column of a dataframe?

I am completely new at Python (this is my first assignment) and I am trying to take the first two digits of the D-column of the following dataframe and put those two digits in a new column F:

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A' : [1, 1, 1, 4, 5, 3, 3, 4, 1, 4], 
                    'B' : [8, 4, 3, 1, 1, 6, 4, 6, 9, 8], 
                    'C' : [69,82,8,25,56,79,98,68,49,82], 
                    'D' : [1663, 8818, 9232, 9643, 4900, 8568, 4975, 8938, 7513, 1515],
                    'E' : ['Married','Single','Single','Divorced','Widow(er)','Single','Married','Divorced','Married','Widow(er)']})

I found several possible solutions here on Stack Overflow, and tried to apply them but none of them is working for me. Either I get some error message (different depending on which solution I tried to apply) I do not get th result that I am expecting.

Upvotes: 4

Views: 19854

Answers (3)

Bill Armstrong
Bill Armstrong

Reputation: 1777

Try this:

import math

def first_two(d):                
     return (d // 10 ** (int(math.log(d, 10)) - 1))

df1['F'] = df1.D.apply(first_two)

output:

In [212]: df1
Out[212]: 
   A  B   C     D          E   F
0  1  8  69  1663    Married  16
1  1  4  82  8818     Single  88
2  1  3   8  9232     Single  92
3  4  1  25  9643   Divorced  96
4  5  1  56  4900  Widow(er)  49
5  3  6  79  8568     Single  85
6  3  4  98  4975    Married  49
7  4  6  68  8938   Divorced  89
8  1  9  49  7513    Married  75
9  4  8  82  1515  Widow(er)  15

Most of the SO solutions use string slicing - this will use math to do the "slice".

or with lambda function

df1['F'] = df1.D.apply(lambda d: d // 10 ** (int(math.log(d, 10)) - 1))

efficiency...

Didn't include the setup - but it is as described above

#string slice method
In [255]: print(t.timeit(100))
3.3840187825262547e-06

#'first_two' method
In [252]: print(t.timeit(100))
1.8120044842362404e-06

#'lambda' method
In [249]: print(t.timeit(100))
1.9049621187150478e-06

It is odd that calling the method is faster than the lambda (?)

Upvotes: 1

jpp
jpp

Reputation: 164623

Here's a solution using NumPy. It requires numbers in D to have at least 2 digits.

df = pd.DataFrame({'D': [1663, 8818, 9232, 9643, 31, 455, 43153, 45]})

df['F'] = df['D'] // np.power(10, np.log10(df['D']).astype(int) - 1)

print(df)

       D   F
0   1663  16
1   8818  88
2   9232  92
3   9643  96
4     31  31
5    455  45
6  43153  43
7     45  45

If all your numbers have 4 digits, you can simply use df['F'] = df['D'] // 100.

For larger dataframes, these numeric methods will be more efficient than converting integers to strings, extracting the first 2 characters and converting back to int.

Upvotes: 2

Kumar
Kumar

Reputation: 776

You could use something like:

df1['f'] = df1.D.astype(str).str[:2].astype(int)

Upvotes: 7

Related Questions