MattR
MattR

Reputation: 5126

Trouble passing basic function in Pandas

I have a very basic function that takes the first six letters of a string. I want to apply it to a column in my DataFrame.

code:

import re
import pandas as pd
import numpy as np
dfp = pd.DataFrame({'A' : [np.NaN,np.NaN,3,4,5,5,3,1,5,np.NaN], 
                    'B' : [1,0,3,5,0,0,np.NaN,9,0,0], 
                    'C' : ['AA1233445','A9875', 'rmacy','Idaho Rx','Ab123455','TV192837','RX','Ohio Drugs','RX12345','USA Pharma'], 
                    'D' : [123456,123456,1234567,12345678,12345,12345,12345678,123456789,1234567,np.NaN],
                    'E' : ['Assign','Unassign','Assign','Ugly','Appreciate','Undo','Assign','Unicycle','Assign','Unicorn',]})
def six_dig(thing):
    return str(thing)[:6]

dfp6= dfp[dfp['C'].apply(six_dig, axis=1)]

But i get: TypeError: six_dig() got an unexpected keyword argument 'axis' I even tried using .map() but get the same error.

If I remove axis=1 I get: KeyError: ["STUFF"] not in index

I must be missing something super simple as I've used functions on DataFrame columns before...

Upvotes: 2

Views: 70

Answers (3)

MaxU - stand with Ukraine
MaxU - stand with Ukraine

Reputation: 210842

If you want to use vectorized functions - here is an example:

In [35]: def my_slice(ser, start=0, end=10, step=1):
    ...:     return ser.str.slice(start, end, step)
    ...:

In [36]: my_slice(dfp.C, end=6)
Out[36]:
0    AA1233
1     A9875
2     rmacy
3    Idaho
4    Ab1234
5    TV1928
6        RX
7    Ohio D
8    RX1234
9    USA Ph
Name: C, dtype: object

Upvotes: 1

pansen
pansen

Reputation: 6663

Using your example, the following works just fine:

print(dfp['C'].map(six_dig))
0    AA1233
1     A9875
2     rmacy
3    Idaho 
4    Ab1234
5    TV1928
6        RX
7    Ohio D
8    RX1234
9    USA Ph
Name: C, dtype: object

Upvotes: 2

Fabio Lamanna
Fabio Lamanna

Reputation: 21552

I think you can just:

dfp6 = dfp['C'].str[:6]

this returns:

In [14]: dfp6
Out[14]: 
0    AA1233
1     A9875
2     rmacy
3    Idaho 
4    Ab1234
5    TV1928
6        RX
7    Ohio D
8    RX1234
9    USA Ph
Name: C, dtype: object

Upvotes: 5

Related Questions