Reputation: 77
I am completely new at Python (this is my first assignment) and I am trying to take the first two digits of the D-column of the following dataframe and put those two digits in a new column F:
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'A' : [1, 1, 1, 4, 5, 3, 3, 4, 1, 4],
'B' : [8, 4, 3, 1, 1, 6, 4, 6, 9, 8],
'C' : [69,82,8,25,56,79,98,68,49,82],
'D' : [1663, 8818, 9232, 9643, 4900, 8568, 4975, 8938, 7513, 1515],
'E' : ['Married','Single','Single','Divorced','Widow(er)','Single','Married','Divorced','Married','Widow(er)']})
I found several possible solutions here on Stack Overflow, and tried to apply them but none of them is working for me. Either I get some error message (different depending on which solution I tried to apply) I do not get th result that I am expecting.
Upvotes: 4
Views: 19854
Reputation: 1777
Try this:
import math
def first_two(d):
return (d // 10 ** (int(math.log(d, 10)) - 1))
df1['F'] = df1.D.apply(first_two)
output:
In [212]: df1
Out[212]:
A B C D E F
0 1 8 69 1663 Married 16
1 1 4 82 8818 Single 88
2 1 3 8 9232 Single 92
3 4 1 25 9643 Divorced 96
4 5 1 56 4900 Widow(er) 49
5 3 6 79 8568 Single 85
6 3 4 98 4975 Married 49
7 4 6 68 8938 Divorced 89
8 1 9 49 7513 Married 75
9 4 8 82 1515 Widow(er) 15
Most of the SO solutions use string slicing - this will use math
to do the "slice".
df1['F'] = df1.D.apply(lambda d: d // 10 ** (int(math.log(d, 10)) - 1))
Didn't include the setup - but it is as described above
#string slice method
In [255]: print(t.timeit(100))
3.3840187825262547e-06
#'first_two' method
In [252]: print(t.timeit(100))
1.8120044842362404e-06
#'lambda' method
In [249]: print(t.timeit(100))
1.9049621187150478e-06
It is odd that calling the method is faster than the lambda
(?)
Upvotes: 1
Reputation: 164623
Here's a solution using NumPy. It requires numbers in D
to have at least 2 digits.
df = pd.DataFrame({'D': [1663, 8818, 9232, 9643, 31, 455, 43153, 45]})
df['F'] = df['D'] // np.power(10, np.log10(df['D']).astype(int) - 1)
print(df)
D F
0 1663 16
1 8818 88
2 9232 92
3 9643 96
4 31 31
5 455 45
6 43153 43
7 45 45
If all your numbers have 4 digits, you can simply use df['F'] = df['D'] // 100
.
For larger dataframes, these numeric methods will be more efficient than converting integers to strings, extracting the first 2 characters and converting back to int
.
Upvotes: 2
Reputation: 776
You could use something like:
df1['f'] = df1.D.astype(str).str[:2].astype(int)
Upvotes: 7